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RESEARCH  OVERVIEW 


This  report  covers  the  period  f roa  October  1,  1984  through  Kerch  31, 

1983.  The  reseereh  discussed  here  is  described  in  more  detail  ia  several  pub¬ 
lished  aad  unpublished  reports  cited  below. 

Several  fuadaaeatal  bounds  oa  the  complexity  of  network  architecture, 
parallel  computation,  VLSI  design,  aad  algorithms  have  been  established  and/or 
improved  during  this  period.  The  grid-matching  problem,  of  iaportaaee  to 
vafer-seale  integration,  ia  close  to  solution.  Improved  algorithms  for  two- 
layer  channel  routing  have  been  developed. 

The  ¥fat-tree*  interconnection  aetvork  has  been  studied  further,  and  a 
better  algorithm  for  on-line  routing  of  messages  ia  this  network  has  been 
developed.  Here  is  continued  interest  in  compaction,  and  a  provably  fast 
algorithm  for  solving  constraint  systems  has  been  devised. 

The  CAD  frame  Schema  has  been  solidified  ia  several  ways  during  this  per¬ 
iod.  It  is  now  possible  to  use  Schema  as  a  schematic  capture  and  data  storage 
system.  Here  is  better  support  being  developed  for  PC-board  designs.  Some 
advanced  ideas  in  describing  waveforms  qualitatively  are  being  incorporated. 

A  novel  PROM  device  that  is  DV- enabled  for  writing  has  been  designed  and 
tested.  The  tradeoff  between  speed  and  fault  probability  in  A/D  converters 
has  been  viewed  from  a  new  angle. ^The  same  tradeoffs  have  been  investigated 
for  inverters,  in  an  embrioaic  study  of  reliability  software.  Development  of 
CAD  tools  for  the  IBM  PC  has  continued. 

The  previously  reported  bounds  for\interconnect  delay  in  MOS  circuits 
have  been  improved  in  several  ways.  SomVbounds  now  pertain  to  RC  meshes 
rather  than  RC  treea;  some  hold  with  resistors  to  ground.  Tighter  bounds  have 
been  found  by  exploiting  slew-rate  limits  oaNqode  voltages. 
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THE  DESIGN  OF  SCHEMA 


During  the  past  sis  aonths.  Seheaa  has  bean  solidified  in  a  nnaber  of 
ways.  The  database  and  seheaatie  capture  tools  have  stabilised  to  the  point 
that  people  are  now  being  to  see  Scheas  to  enter  and  store  designs  which  will 
be  aaintained  for  long  periods  of  tine.  Our  colleagues  at  Harris  are  confi¬ 
dent  enough  in  the  architecture  that  they  are  developing  the  first  application 
tool  to  be  built  on  top  of  Scheas  rather  than  an  integral  part  of  Scheas.  In 
addition  a  nnaber  of  the  internal  eoaponents  that  are  needed  for  VLSI  design 
are  being  pat  into  place. 

Aanja  Kohli  has  enhanced  the  seheaatie  capture  portion  of  the  systea  so 
that  it  can  handle  logic  syahola  with  an  arbitrary  nnber  of  inputs,  and  is 
developing  a  basic  spatial  aanagaaent  systea  for  scheaatica.  This  tool  will 
perait  the  creation  of  routing  prograas  for  seheaaties  and  aay  also  be  used 
for  gets  arrays.  In  addition  they  are  used  to  aanage  the  placeaent  of  test  on 
the  screen  and  generally  iaprove  the  aesthetics  of  the  designs  as  entered  by 
hand. 


Our  colleagues  at  Harris.  Inc.  are  developing  a  wirewrap/PC  board  devel- 
opaent  tool  on  Scheas.  This  package  ashes  use  of  Scheas' s  newly  enhanced 
ability  to  deal  with  logic  diagraas  and  hierarchical  designs.  There  are  three 
phases  to  the  project.  First,  the  logic  seheaatie  is  converted  to  technology 
specific  diagraa  by  binding  gates  to  particular  iapleaentations.  e.g.  a  NAND 
gate  is  converted  to  a  74S00.  Second,  the  gates  are  partitioned  into  packages 
and  the  packages  are  placed.  At  this  point  a  wire  wrap  board  can  be  created. 
Finally,  a  sore  detailed  adjuataent  of  the  placeaent  is  aade  and  the  signals 
are  assigned  layers  and  routed.  This  last  phase  is  being  done  by  Don  Becker 
here  at  MIT. 

Brian  Villiaas  has  been  refining  his  teaporal  constraint  propogation 
tools  in  preparation  for  their  incorporation  into  Scheas.  Margaret  St.  Pierre 
has  begun  specifying  the  wavefora  representations  and  siaulation  interface  for 
Scheas.  Unlike  previous  versions  of  these  representations.  Margaret's  will 
perait  qualitative  values  to  be  used  both  for  the  tiae  specification  and. for 
the  value.  The  basic  idea  is  that  a  wavefora  is  a  aapping  between  tiae  and 
soae  value  space.  The  value  space  can  be  a  continuous  quantitative  doaain  as 
is  used  by  Spice  for  voltages  and  a  currents,  a  discrete  quantitative  doaain 
as  is  used  in  logic  siaulation.  or  a  qualitative  doaain  as  is  used  by  Brian 
Villiaas  qualitative  reasoning  systea.  Vhen  a  wavefora  is  asked  for  the  value 
at  soae  tiae.  the  tiae  value  can  be  any  open  or  closed  interval,  including  a 
point.  If  the  tiae  value  is  not  a  point,  the  value  returned  aay  be  a 
qualitative  value.  This  aechanisa  will  ease  greatly  the  effort  required  to 
incorporate  qualitative  reasoning  aechanisas  to  Scheaa. 
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THE  WAVEFORM  BOUNDING  APPROACH  TO  TIMING  ANALYSIS 


#  Our  work  since  October  1*  1984  bus  bees  concentrated  on  bounds  for  signal 

delay  in  liaear  RC  aodels  for  on-chip  interconnect.  Taking  as  a  starting 
poiat  the  work  oa  Rubinstein*  Peafield.  and  Horowitz  (IEEE  Trans.  CAD*  July* 
1983)*  we  hawe  been  able  to  include  sore  general  networks  than  RC  trees  driven 
by  voltage  step  inputs.  Ve  have  also  succeeded  in  reducing  the  region  of  un¬ 
certainty  in  the  original  bounds  for  certain  classes  of  networks  of  praetieal 
®  interest. 


One-  extension  we  have  completed  is  a  nethod  of  bounding  the  response  of 
RC  aeshes*  which  are  aore  general  than  RC  trees  in  that  resistor  loops  are 
allowed.  These  networks  are  iaportant  in  practice  1)  as  aodels  for  the  gates 
of  large  NOS  pad  driver  transistors*  2)  whenever  linear  resistor  aodels  are 
used  for  transistors  in  logic  gates  or  CMOS  pass  gates*  as  in  Chris  Toman's 
prograa  RSIM,  and  3)  to  aodel  interconnect  networks  with  closed  loops  soae- 
tiaes  created  by  autoaatie  routing  prograas.  Another  successful  extension  is 
to  networks  with  resistive  paths  to  ground,  which  are  appropriate  aodels  for. 
e.g.*  interconnect  to  bipolar  logic  gates. 

Tighter  bounds  have  been  achieved  for  unbrancheed  lines  and  certain 
classes  of  RC  trees  by  exploiting  slew  rate  liaits  on  the  node  voltages  and 
exploiting  the  spatial  convexity  of  interconnect  voltage  during  transients  in 
a  novel  way. 

Two  aaster's  level  graduate  students,  Ray  Schnitzler  and  David  Standley. 
are  being  supported  by  this  contract. 
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HIGH  PERFORMANCE  CIRCUIT  DESIGN 


Progresa  was  ud«  in  four  areas:  PROMa,  A/D  converters.  understanding 
noise  aargin/speed/reliability  tradeoffs,  and  PC  aierotools. 

Ve  have  recently  deaonstrated  a  new  type  of  prograaaable  read-only  aeaory 
(PROM).  Hie  invention  allows  nsers  of  eonrentional  nMOS  and  CMOS  proceases, 
snob  as  those  available  through  NOS IS.  to  plaee  several  hundred  bite  of  elec¬ 
trically  alterable  read-only  aeaory  on  any  custoa  VLSI  chip.  Typical  applica¬ 
tions  aight  include  the  storage  of  cryptographic  codes,  special  addresses, 
calibration  data,  or  repair  locations  for  fault- tolerant  systeas. 

The  size  of  the  new  non-volatile  PROM  cell  is  about  twice  the  size  of  a 
conventional  static — but  volatile — aeaory  cell.  The  prograaaing  process, 
while  it  does  not  require  the  use  of  high  voltages  or  special  processing,  aust 
be  done  in  the  presence  of  ultraviolet  (UV)  light.  Each  cell  can  be  individu¬ 
ally  written  while  the  DV  light  floods  the  entire  chip. 

The  ezperiaental  chips  were  fabricated  through  MDSIS  in  4  aicron  nMOS. 
Write  tines  on  the  order  of  ten  ninutes  were  observed.  So  far.  the  cells  have 
retained  their  state  for  months,  and  yeara  of  storage  is  projected.  We  hope 
the  new  PROM,  which  is  not  seen  as  replacing  coaaercial  EEPROMs  built  with 
special  processes,  will  find  wide  system  application  where  the  use  of  exotic 
and  expensive  processes  for  just  a  few  bits  of  field  prograaaable.  non¬ 
volatile  storage  is  uneconomical  or  infeasible. 

In  the  area  of  A/D  convertion.  we  exaained  the  fundaaental  limits  on  the 
speed  of  A/D  converters  as  a  function  of  the  probability  of  a  fault.  This 
fault  problea  is  coapletely  analogous  to  the  synchronizer  problea  in  digital 
circuits.  After  all.  if  one  could  build  a  perfect  A/D  converter,  one  could 
build  a  perfect  synchronizer.  We  have  found  that,  for  extreaely  high  levels 
of  reliability,  flash  and  self-tiaed  successive  approxiaation  converters  are 
equally  slow  because  they  both  spend  virtually  all  their  tiae  resolving  the 
one  hard  bit. 

We  have  continued  our  investigation  of  tradeoffs  between  speed  and 
reliability.  We  have  discovered  that  the  lower  bound  on  inverter  pair  delay 
increases  by  50%  as  the  noise  aargins  are  increased  froa  zero  to  their  aaxiaua 
values  of  half  the  power  supply  rail.  We  have  also  investigated  the  transient 
step  response  of  inverters  and  seen  tradeoffs  between  the  reliability  aeasure 
(the  noise  aargin  divided  by  the  worst-ease  noise)  and  the  ultimate  speed. 

We  have  continued  our  back-burner  effort  on  VLSI  aierotools— a  set  of 
prograas  for  helping  with  the  early  design  stages  of  a  VLSI  chip.  These  tools 
run  on  IBM  PCs  and  either  use  LOTUS  1-2-3  or  TKISolver.  They  solve  such  prob¬ 
lems  as  finding  the  temperature  rise  in  a  aetal  line  or  its  fringing  capaci¬ 
tance  as  a  function  of  the  wire  geometry.  They  also  do  characteristic  imped¬ 
ance  calculations  for  PCSs.  Community  aeabers  nay  have  copies  of  these  pro¬ 
graas  for  free,  and  at  their  own  extreme  risk,  by  sending  ae  a  diskette.  We 
proaise  bugs  for  all. 
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Professor  Leighton  is  continuing  research  oa  several  problems  iavolviag 
novel  network  architectures,  parallel  computation,  VLSI  design  and  the  devel¬ 
opment  of  algorithms  for  NP-coaplete  probleas  which  provably  work  well  oa  the 
average.  Advances  have  been  made  in  several  areas  during  the  past  six  months. 
Highlights  are  described  in  the  following  paragraphs. 

In  the  algorithms  area.  Professors  Leighton  and  Sipser,  Thaag  Bui  and 
Soma  Chaudhuri  (University  of  Washington  at  Seattle)  have  developed  graph 
bisection  algorithms  which  (provably)  almost  always  find  the  minimum  bisection 
of  graphs  with  small  bisections.  These  algorithms  perform  dramatically  better 
than  known  techniques  for  large  classes  of  relevant  graphs.  The  work  will 
form  an  important  part  of  Thang  Bui's  PhD  thesis,  which  should  be  completed  by 
this  summer. 

In  the  area  of  fault-tolerant  construction  of  VLSI  networks.  Professors 
Leighton  and  Rosenberg  (Duke  University)  and  Dr.  Chung  (Bell  Communications 
Research  Labs)  have  developed  efficient  algorithms  and  bounds  for  representing 
useful  networks  as  a  small  number  of  "stacks'*  of  wires.  As  the  stacks  are 
easily  implemented  in  VLSI,  the  results  make  possible  the  efficient  configura¬ 
tion  of  fault-free  networks  in  environments  that  contain  defective  components. 

In  related  work.  Professor  Leighton  and  Peter  Shor  (who  is  expected  to 
finish  his  PhD  thesis  this  summer)  are  close  to  solving  the  grid  matching 
problem.  Roughly  stated,  the  problem  is  to  determine  the  expected  minimum 
maximum  edge  length  over  all  perfect  matchings  of  N  random  points  to  N 
fixed  points  that  are  arranged  in  an  N*/2  z  grid  with  unit  spacing 

between  consecutive  rows  and  columns.  Professors  Leighton  and  Leiserson 
proved  an  upper  bound  of  O(log  N)  and  a  lower  bound  of  Q(log  N)l/2  for 
this  problem  in  their  work  on  wafer-scale  integration  of  systolic  arrays  in 
1982.  Determination  of  the  precise  bound  has  remained  a  difficult  and  impor¬ 
tant  open  problem  ever  since.  It  now  appears  that  the  exact  bound  for  the 
grid  matching  problem  is  9(log3/4  N) ,  improving  both  the  upper  and  lower 
bounds.  As  a  direct  result  of  this  work,  it  will  be  possible  to  improve  the 
best  bounds  knowh  for  the  average  case  behavior  of  algorithms  for  wafer-scale 
integration  as  well  as  for  a  variety  of  other  packing  and  assignment  problems. 

Professor  Leighton  and  Johan  Hastad  (a  first  year  graduate  student)  are 
developing  efficient  circuits  for  parallel  division.  Currently,  the  only 
known  circuit  that  can  compute  the  N  most  significant  bits  of  a  quotient  in 
O(log  N)  parallel  steps  requires  G(N*)  processors.  Preliminary  work  by 
Leighton  and  Hastad  indicates  that  the  number  of  processors  can  be  decreased 
to  0(Nl+*)  where  a  is  an  arbitrarily  small  positive  constant.  Although 
not  yet  practical,  the  improvement  in  hardware  requirements  is  significant. 

Professor  Leighton  and  Bonnie  Berger  (another  first  year  graduate  stu¬ 
dent)  are  developing  improved  algorithms  for  2-layer  channel  routing.  Initial 
progress  in  this  area  suggests  that  it  may  be  possible  to  achieve  the  perfor- 
manee  of  the  Baker-Bhatt-Leightou  Manhattan  routing  algorithm  and  the  Rivest- 
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Baratx-Miller  knock-knee  rooting  algorithm  with  a  single,  simpler  algorithm. 
More  importantly,  it  appears  that  the  new  algorithm  can  be  extended  to  the 
nnit-wertical-overlap  model  (in  which  wires  can  overlap  only  for  onit  distance 
and  only  in  the  vertical  direction)  where  a  factor  of  two  in  channel  width  can 
be  saved.  The  factor  of  two  is  significant  because  the  new  algorithm  always 
routes  2-point  net  channels  with  width  d+0(d)  instead  of  the  best  previously 
known  bound  of  2d-l.  Bare  d  denotes  the  density  of  the  channel  which,  of 
course,  is  a  lower  bound  on  channel  width.  The  results  also  hold  for  multi¬ 
point  net  problems,  except  that  an  additional  factor  of  two  in  channel  width 
is  required. 

Peter  Shor  has  been  investigating  the  average-case  behavior  of  bin  pack¬ 
ing  algorithms.  In  the  case  where  the  item  sizes  are  uniformly  distributed, 
he  has  derived  much  tighter  bounds  on  the  wasted  space  produced  by  the  algo¬ 
rithm  First  Fit  than  were  previously  known,  and  has  the  exact  answer,  up  to  a 
constant,  for  the  wasted  space  produced  by  the  algorithm  Best  Fit.  He  has 
also  derived  a  lower  bound  for  any  on-line  that  shows  that  onrline  algorithms 
cannot  do  as  well  as  off-line  algorithms,  and  that  Best  Fit  comes  within  a 
small  factor  of  being  optimal  among  on-line  algorithms. 

Charles  Leiserson  and  Ron  Greenberg  have  further  improved  their  algorithm 
for  on-line  routing  of  messages  in  the  "fat-tree"  interconnection  network. 

This  probabilistic  algorithm  is  novel  in  that  it  does  not  randomize  in  the 
choice  of  message  paths  or  in  the  operation  of  the  switches,  but  rather  in  the 
choice  of  whether  or  not  to  send  a  particular  message  in  a  particular  delivery 
cycle.  The  algorithm  ensures  that  a  set  of  messages,  M,  can  be  routed  with 
high  probability  within  0( lambda (M) log ImI)  delivery  cycles,  where  lambda (M)  is 
the  maximum  over  all  communication  links  of  the  ratio  of  the  number  of  mes¬ 
sages  in  M  which  must  pass  through  the  link  to  the  capacity  of  the  link.  This 
work  may  also  have  some  applicability  to  routing  networks  other  than  ''fat- 
trees." 


Miller  Maley  has  developed  a  provably  fast  algorithm  for  solving  con¬ 
straint  systems  in  VLSI  layout  compaction.  Constraint  solving  is  usually  done 
by  the  Bellman-Ford  algorithm,  which  has  O(lvliEl)  running  time  in  the  worst 
case.  Heuristics  have  been  developed  which  allow  the  system  of  constraints 
arising  in  compaction  to  run  much  more  quickly  than  this  bound.  The  new  algo¬ 
rithm  runs  in  0( lEl+|v|loglvl )  time  in  the  worst  case. 

Susmita  Sur  is  currently  writing  up  the  work  on  channel  stretching  in  the 
PI  project. 
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1.  Introduction 


The  work  of  Rubinstein  et.  al.  [1]  and  Horowitz  [3]  on  bounds  for  the  step  response  of 
a  linear  RC  tree  concerns  the  system  (for  the  case  of  a  falling  transient) 

N 

£  "ktC'i't  =  -  vk,  t>k(0)=  1.*=  I . N.  (1.1) 


It  was  shown  that  for  an  RC  tree, 


Italia.  -  Hkiliji  >  0 


-  vk(t)  >0,  V  t  >  0. 


It  follows  from  (1.2)  and  (1.3)  that 


where 


T«,u,(t)  <  g,(t)  <  Tpvt(t), 


v  ,(r)dT, 


Tpkf^Rk,Ck, 


N  /?* 

(»-7) 

k.i  “*« 

Using  (1.1)  -  (1.7),  upper  and  lower  bounds  for  v,(t),  0  <  t  <  »  were  derived. 

Charles  Zukowski  of  M.  I.  T.  made  the  remarkable  observation  that  these  bounds  can 
be  interpreted  as  solutions  of  certain  optimal  control  problems.  Consider,  for  example,  the 
lower  bound.  Suppose  v’(t)  solves  the  following  minimum-time  problem  in  which  an  input 
«(•)  is  introduced  to  represent  the  unknown  waveform  «,(■)  : 

Minimize  t/ 


cr,(t)  —  -  !/.(<M  =  T»r, 


MO  =■•  *(/).  -.(«?)  ---  1. 


2 


0  <  V\  <  1, 


(MO) 


«,(</)=  V\, 

«(t)  <  0.  V  t  >  0, 


(1.11) 


Tiuvr(t)  <  <7,(0  <  7>.(0,  Vt  >  0.  (1. 12) 

We  denote  this  minimum  time  tmtu(V‘t),  since  it  depends  on  the  "target"  voltage  V*. 
Since  t/„(0)  >  V\  and  »,(•)  is  continuous,  «,(/)  >  V‘„  V  t  €  [0,tmin(Vt‘)l  :  in  particular  V\ 
is  a  lower  bound  on  v,(tfn,„(V*)).  Thus  the  inverse  of  the  function  tm,n(V‘),  denoted  here 

(0-  is  a  lower  bound  on  all  solutions  »»r(0  of  (1.8)  -  (1.12). 

This  interpretation  allows  a  particularly  simple  derivation  of  the  bounds  in  £i  J.  We  can 
also  use  the  optimal  control  approach  to  derive  tighter  bounds  that  result  from  imposing 
additional  constraints  on  the  control  u(  ).  Yu  [2]  has  shown  that  for  any  RC  tree  and  any 
node  e  there  exists  a  Ttt  >  0  such  that 


«.(*)  "  T.t' 


Vt  >  0. 


(1.13) 


In  terms  of  (1.9)  and  (1.11),  (1.13)  translates  into  a  new  constraint  on  the  control. 


“(0  >  -  J-w,(t).  (i.M) 

*  •« 

The  nature  of  the  optimal  trajectory  for  the  minimum  time  problem  with  the  additional 
constraint  (1.14)  is  discussed  in  [2J  for  some  special  cases.  The  purpose  of  this  memo  is  to 
provide  a  complete  and  rigorous  derivation  of  the  solutions  of  the  optimal  control  problems 
for  both  maximum-time  (upper  bound)  and  minimum-time  (lower  bound)  and  for  all  values 
of  T„,  without  invoking  Pontryagin's  maximum  principle  [4).  The  problem  of  deriving  a 
numerical  value  for  T.t  for  a  given  tree  was  studied  in  detail  in  [2]  and  will  not  be  discussed 


2.  Minimum-Time  Trajectory 


Problem  Statement  (Minimum-Time  Problem)  : 

For  each  "target’’  voltage  V',,0  <  V‘  <  I,  determine 

min  tr 
u6  U 

with 

System  Oynamics  : 


0.(0  —  -  *40. 

0.(0)  *  7*0.; 

(2.1) 

Target : 

v.(0  *  40. 

v.(0)  =«  1; 

(2.2) 

Admissible  Control  Set : 

/)  * 

0<  V;  <  1; 

(2.3) 

-=<  All  piecewise  continuous  «(•)  j  - -J-v.(r)  <  u(t)  <  0,  0< 

l  Tee 

State  Constraints  : 

TR'V'(t)  <  g,(t )  <  r,v.(t).  (2.5) 

The  solution  of  the  minimum  time  problem  can  be  derived  from  the  following  two 
lemmas,  without  resorting  to  optimal  control  theory. 

Lemma  2.1 

Let  u*(t), 0  <  t  <  tt  be  the  optimal  control  of  the  minimum-time  problem  without  the 
state  constraints  (2.5).  If  the  solution  g '{<)  and  v‘(t)  of  (2.1)  and  (2.2)  with  control  u(t)  =  u*(t) 
does  not  violate  the  state  constraints  (2.5),  then  u'(t)  is  also  the  optimal  control  for  the 
minimum-time  problem  with  state  constraints  (2.5). 

The  proof  of  Lemma  2.1  is  trivial  and  is  omitted.  Its  value  is  that  the  optimal  strategy  for 
the  problem  without  state  constraints  is  particularly  simple  :  to  drive  »•, (t)  from  1  to  V*  <  1 
as  fast  as  possible,  simply  decrease  »v(0  at  the  maximum  allowable  rate  at  each  t,  i.e. 

»‘(0  -  -  ~*v(0-  C-i.fi) 

*  ar 

(To  see  this  rigorously,  assume  «(•),  piecewise  continuous  and  that  »(0  >  u*(/).  v 
t  €  •/  where  ./  C  [«.</!  is  a  nonzero  time  interval.  Therefore  »\(r)  =  «(/)  >  -  v,(t)/T.r  =* 

7’..  »*,(/)  f-  v,(t)  —  *(i)  >0  v  / 1-  ;»./,)  and  *(/)  >»  v  /  c  •/.  By  integration,  we  obtain 
,,,(/)  ,-r  ,,,(<)),• n|.{  -  r/ 7'.. }+  /„'.»(/•-  r )r\ (»{ -  r /7’„  ) tlr  >  v. (0)i  \ i»{ -  t/T„  }  .,'(/).  since  both 
•■»(/)  and  the  impulse  response  .\|>{  //  /'.,  }  are  positive  over  ./  .) 


Moreover  the  trajectory  under  the  action  of  u  (-)  is  a  straight  tine  with  slope  T„  in  the 
a  -  r  plane  because 


‘hi  _  MO  _  ~  »>*(*)  __  T 

*•;  mo 


(2.7) 


Lemma  2.2 

Consider  two  trajectories  /  and  //  with  common  initial  and  final  states  A  and  B,  such 
that  /  lies  entirely  above  //as  shown  in  Fig.  2.1.  Then  the  time  taken  to  reach  B  from  A 
along  /  is  strictly  longer  than  that  along  //. 


Proof  : 

Since  %•  *»  -  ve,  the  time  taken  to  reach  B  from  A  along  any  path  P  in  the  plane  is 
given  by 


t 


? 

A—B 


Since  path  /  lies  above  II  .  path  II  must  lie  to  the  right  of  /,  as  shown  in  Fig.  2.1.  Therefore 


f  —dg 

+  9  m 


•99  i/Uf) 

/  1 

J/r  v 


(since  v[,l)  >  for  each  fixed  g.) 


=  I.  --_do  =  t‘ALa- 


■ 


2.1 .  Case  A  :  0  <  T„  <  Tlt, 
See  Figs.  2.2(i  -  iii). 


Proposition  2.1 
(i)  For  1  >  V’t  > 


t,  =  r.,ln 


(ii)  For  Zf'-zTit'  >  vl  > 

~  T/j,  —  7»,  -  (7p  -  T.,)Vr  +  T„  Iii 


I  «  1 


(iii)  For  Tr'~T‘-  >  V  >  II, 


(2.8) 


V' 


(2.n) 


(i)  From  Fig.  2.2(i).  V'f  can  be  reached  from  A  by  decreasing  »«,(<)  at  the  maximum 
allowable  rate,  without  violating  the  state  constraints  (2.5)  This  is  in  fact  the  optimal 
strategy  for  the  minimum  time  problem  without  the  state  constraints  (2.5).  Hence  path 
AB  is  the  optimal  trajectory  by  Lemma  2.1.  Therefore 

*v(0  *  ***(<)  *  -  v*(°)  —  *• 

•  •« 

■*  =  4-) 

*  •« 


T„  In  —  . 


(fi)  We  claim  that  trajectory  ABC  in  Fig.  2.2(11)  is  Optimal.  To  see  this,  consider  any  other 
trajectory  which  is  has  the  same  initial  condition  (1.7/,,)  and  terminates  at  the  target 
set.  but  different  from  ABC  on  some  portion  of  its  path.  Such  a  trajectory  either  lies 
entirely  above  ABC,  such  as  /,  or  meets  ABC  at  some  point,  say  B',  such  as  II.  I  is 
nonoptimal  by  Lemma  2.2.  II  is  also  nonoptimal  because  along  II  AB'  takes  longer 
than  ABB'  by  Lemma  2.2  and  B'C'  takes  longer  than  B'C  by  Lemma  2.1.  Adding  the 
times  along  AB  and  BC  yields  (2.9). 

(iii)  Path  ABCO  in  Fig.  2.2(iii)  is  optimal  by  the  same  reasoning  in  part  (ii).  Adding  the 
times  along  AB,  BC  and  CD  yields  (2.10). 


2.2.  Case  B  :  TH.  <  T„  <  Tq, 

See  Figs.  2.3(i  -  ii). 

Proposition  2.2 

(i)  For  I  >  V\  >  the  optimal  trajectory  is  AB  (  Fig.  2.3(i)  )  and 

0  —  (2.11) 


(2-12) 


(ii)  For  Tt'St!'  >  V’r  >  **•  the  optimal  trajectory  is  ABC  (  Fig.  2.3(ii)  )  and 


t 

t)  =  To,  -  T.r  -  V'r(Tv  -  T„)  +  T„  In  —  . 

f. 


The  proof  of  Proposition  2.2  is  exactly  the  same  as  that  of  Proposition  2.1  and  is  omitted. 
2.3.  Case  C  :  7,>  <  T.e 

In  this  case,  the  minimum  value  of  V\  attainable  from  ».,((»)  -  I  using  admissible 
controls  «(•)<•  u  without  violating  the  state  constraints  (2.5)  is  ('/’,>.  -  'l\.),'(T,{r  -  7..),  as  is 
apparent  from  Fig.  2.4.  No  trajectory  can  satisfy  all  the  constraints  and  tend  asymptotically 
to  the  origin  as  i  -*  oo.  In  terms  of  the  RC  tree  problem,  case  C  can  only  arise  through  a 
modelling  error,  so  we  will  not  prusue  it  further. 


3.  Maximum-Time  Trajectory 


Problem  Statement  (Maximum-Time  Problem) : 

For  each  "target"  voltage  y‘,0  <  V\  <  l,  determine 

max  tt 
«€ii 

with 

System  Dynamics  : 

WO- -MO.  3.(0)  =  Tpf;  (3.1) 


5.(0  =  «(0.  MO)  =  I;  (3.2) 

i  Terminal  Time  Constraint : 

MO)  £  C  0  <  K  <  1;  (3.3) 

Admissible  Control  Set : 

All  piecewise  continuous  «(•)  |  -  <  u(t)  <  0,  0  <  t  <  (3.4) 

State  Constraints  : 

TlltV'(t)  <  gt(t )  <  7>f(0-  (3.5) 

The  maximum-time  problem  is  easier  than  the  minimum-time  problem  and  its  solution 
for  different  values  of  7'„  and  V\  can  be  derived  using  Lemma  2.2  only.  We  state  the 
solutions  below.  Proofs  are  omitted. 


3.1.  Case  A  :  0  <  Ttt  <  T,u 
See  Figs.  3.i(i  -  ji). 

Proposition  3.1 

(i)  For  1  >  V‘  >  y-’Zr"  •  the  optimal  trajectory  is  AB  (  Fig.  3.1  (i)  )  and 


t,  =  T„  In 


_l_ 

v; 


-  -  rH.)vl  +  (T»,  -  r.,)}. 

v  t 


(ii)  For  >  V'r  >  0,  the  optimal  trajectory  is  ABCD  (  Fig.  3.1(ii)  )  and 


1  Tl)r  —  Tm, 

!  T„r  -  T.r  J  " 

V ;  -  T.t 

r  J 

(3-6) 


(3-7) 


3.2.  Case  C  :  Tl>r  <  7’., 


The  results  here  are  identical  to  those  in  section  2.3. 


4.  Conclusion 


We  invert,  where  possible,  the  expressions  derived  in  Props.  2.1  -  2.3  and  Props.  3.1 
-  3.2  to  obtain  the  final  expressions  below,  for  the  upper  and  lower  waveform  bounds.  For 
the  case  of  7’„,  <  T,„  where  not  all  values  of  V\  <  i  are  attainable,  the  upper  and  lower 
bounds  are  not  meaningful  and  therefore  are  omitted  here. 

4.1.  Lower  Bounds 


4.1.1.  Case  A  :  0  <  T„  <  77*. 
(i)ForO<t<r«ln[7^7L 


O) -«-{-£}. 


(ii)  For  T. 


I  _  7**  — 

In  r - y— 


<  t  <  To,  —  T„  +  T,f  In 


T,-T.. 


,  there  is  no  explicit  function  for 


v’c(t),  because  (2.9)  is  not  invertible. 


(iii)  For  t  >  T,y,  -  T„  +  T„  In 


4.1.2.  Case  B  :  T,{,  <  T.r  <  T„, 


I-  (To,  -  T„)~  r„ln[(rF-  T„)/(Th,-  T.,)V 


(i)  For  0  <  f  <  T„  In 


r.-  r.. 


5^-}. 


For  t  >  T„  In  [ rZ’J'r;.  > there  is  no  explicit  expression  for  V'’(r).  because  (2.12)  is  not 


invertible. 


4.2.  Upper  Bounds 


4.2.1.  Case  A  :  0  <  7„  <  T„ , 

(i)  For  0  <  t  <  Tp-  T,tr+  7*»,  In  | .  there  is  no  explicit  expression  for  v *(r).  because 
(3.6)  is  not  invertible. 

(ii)  For  t  >  Tp  -  T„.  f  7..  Inf  ^rril 
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Electrical  circuit  designers  seldom  create  really  new  topologies  or  use 
old  ones  in  a  novel  way.  Most  designs  are  known  combinations  of  common  con¬ 
figurations  tailored  for  the  particular  problem  at  hand.  In  this  thesis  I 
show  that  much  of  the  behavior  of  a  designer  engaged  in  such  ordinary  design 
6  can  be  modeled  by  a  clearly  defined  computational  mechanism  executing  a  set 

of  stylized  rules.  Each  of  my  rules  embodies  a  particular  piece  of  the 
designer's  knowledge. 

A  circuit  is  represented  as  a  hierarchy  of  abstract  objects,  each  of 
which  is  composed  of  other  objects.  The  leaves  of  this  tree  represent  the 
physical  devices  from  which  physical  circuits  are  fabricated.  By  analogy  with 
context-free  languages,  a  class  of  circuits  is  generated  by  a  phrase-structure 
grammar,  of  which  each  rule  describes  how  one  type  of  abstract  object  can  be 
expanded  into  a  combination  of  more  concrete  parts. 


Circuits  are  designed  by  first  postulating  an  abstract  object  which  meets 
©  the  particular  design  requirements.  This  object  is  then  expanded  into  a  con¬ 

crete  circuit  by  successive  refinement  using  rules  of  my  grammar.  There  are 
in  general  many  rules  which  can  be  used  to  expand  a  given  abstract  component. 
Analysis  must  be  done  at  each  level  of  the  expansion  to  constrain  the  search 
to  a  reasonable  set.  Thus  the  rules  of  my  circuit  grammar  provide  constraints 
which  allow  the  approximate  qualitative  analysis  of  partially  instantiated 
©  circuits.  Later,  more  careful  analysis  in  terms  of  more  concrete  components 

may  lead  to  the  rejection  of  a  line  of  expansion  which  at  first  looked 
promising.  I  provide  special  failure  rules  to  direct  the  repair  in  this  case. 
As  part  of  this  research  I  have  developed  a  computer  program,  CIROP,  which 
implements  my  theory  in  the  domain  of  operational  amplifier  design. 
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ABSTRACT 


This  paper  presents  a  method  for  modeling  MOS  combinational  logic  gates. 
Analyses  are  given  for  power  consumption,  output  response  delay,  output 
response  waveshape,  and  input  capacitance.  The  models  are  both  computa¬ 
tionally  efficient  and  accurate,  typically  lying  within  5Z  of  SPICE  estimates. 
They  are  pertinent  to  simulation  and  optimization  applications.  A  general 
macromodeling  software  support  package  is  described.  A  companion  paper 
discusses  a  circuit  optimizer  based  on  these  models. 
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Macromodeling  of  Digital  MOS  VLSI  Circuits 


Abstract 

This  paper  presents  a  method  for  modeling  MOS  combinational  logic  gates.  Analyses  are  given  for 
power  consumption,  output  response  delay,  output  response  waveshape,  and  input  capacitance. 
The  models  are  both  computationally  efficient  and  accurate,  typically  lying  within  5%  of  SPICE 
estimates.  They  are  pertinent  to  simulation  and  optimization  applications.  A  general  macromodeling 
software  support  package  is  described.  A  companion  paper  discusses  a  circuit  optimizer  based  on 
these  models. 
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1.  Introduction 

This  paper  discusses  accurate,  computationally  efficient  models  for  MOS  logic  gates.  The  models 
are  well  suited  for  simulation  and  optimization  of  high  performance  VLSI  circuits.  The  models  are 
based  on  device  equations,  and  acquire  much  of  their  accuracy  through  careful  consideration  of 
waveshape  effects. 

The  significance  of  waveshape  effects  has  been  investigated  by  other  workers.  Crystal  [1].  a  timing 
simulator,  models  transistors  as  resistors,  but  uses  different  values  for  transistor  resistances 
depending  on  input  waveform.  While  this  leads  to  good  accuracies  (typically  within  10%  of  SPICE 
predictions),  the  approach  does  have  some  limitations.  For  example,  the  tables  of  effective  transistor 
resistances  depend  on  a  uniform  trigger  voltage  (the  point  on  a  logic  gate's  transfer  curve  where  Vq^ 
»  v|N)  and  can  produce  substantial  errors  if  this  restriction  is  removed,  for  instance  by  varying  beta 
ratios.  Moreover  the  table  interpolations  can  generate  jagged  delay  functions;  this  can  make  the 
optimization  task  more  difficult 

For  these  reasons  we  chose  to  base  our  models  entirely  on  device  equations.  Horowitz  [2]  pursued 
a  similar  strategy  in  modeling  the  delay  of  a  MOS  inverter.  He  derived  equations  for  the  gate's 
response  and  then  obtained  estimates  of  parameters  from  the  gate's  drive  curves  (curves  of  Vg^ 
versus  v|N  for  different  values  of  load  current). 

In  this  paper  we  describe  more  general  and  sophisticated  models.  We  develop  equations  for  power 
consumption,  output  waveform,  and  input  capacitance  of  a  general  MOS  logic  gate.  To  obtain  high 
accuracy  in  the  model,  we  wrote  a  macromodeling  support  package  to  determine  the  equations’ 
parameters.  The  package  curve  fits  the  model  equations  to  SPICE  simulation  results  and  finds  the 
parameter  set  which  provides  the  highest  accuracy. 

Section  2  discusses  the  basic  principles  of  the  macromodeling  approach.  Section  3  presents 
models  for  MOS  inverters.  We  begin  with  a  resistor-capacitor  model  and  discuss  its  limitations.  We 
then  develop  a  more  elaborate  model,  one  accounting  for  waveform  shape  effects.  The  analysis  is 
extended  to  more  general  logic  gates  in  section  4.  The  theory  gives  us  the  form  of  the  macromodel 
equations.  In  section  5  we  describe  how  the  equations'  parameters  are  determined  with  a 
sophisticated  macromodeling  support  package. 


2.  Motivation  and  Intent 

Circuit  optimization  is  a  computationally  expensive  process.  It  is  an  iterative  procedure,  requiring 
multiple  simulations  at  each  step  to  evaluate  delays  and  their  gradients.  Moreover,  high  performance 
circuit  design  requires  fairly  accurate  delay  estimates,  but  using  a  device  level  simulator  would  be  out 
of  the  question  for  all  but  small  circuits. 

Since  it  is  too  time  consuming  to  compute  circuit  responses  during  the  optimization,  we  instead 
pursue  an  approach  where  much  of  the  work  is  performed  prior  to  the  optimization.  We  divide  a  large 
circuit  into  many  small  pieces.  This  partitioning  is  done  such  that  the  pieces  have  limited,  well 
understood  interactions,  while  the  elements  inside  the  pieces  have  strong,  complex  interactions. 
Thus  computing  the  interactions  among  elements  within  a  piece  would  be  very  expensive,  and  it 
behooves  us  to  characterize  the  behavior  of  the  pieces  beforehand  to  avoid  having  to  compute  it 
during  the  optimization. 

This  approach  is  called  macromodeling.  In  the  digital  MOS  domain,  candidates  for  pieces  would  be 
cells  such  as  logic  gates  and  storage  elements.  We  model  the  attributes  of  the  cells  as  functions  of 
the  cell's  internal  description  and  boundary  conditions.  In  particular,  we  are  concerned  with  a  cell's 
power,  input  load,  and  output  waveform  attributes.  The  cell's  internal  description  consists  of  its 
transistor  sizes,  layout  parasitics,  and  process  parameters.  Boundary  conditions  are  imposed  on  the 
cell  by  external  agents.  These  include  input  waveforms  from  drivers  and  output  loading  from 
receivers  and  wiring  capacitances.  We  characterize  waveforms  as  time-shifted  ramps  with 
exponential  tails.  This  waveshape  is  representative  of  those  found  in  digital  MOS  circuits.1  Figure  ^ 
displays  an  example.  The  chain  of  inverters  is  driven  by  a  falling  input  waveform;  the  figure  shows  the 
output  waveform  of  each  gate.  Here  TBE  denotes  the  time  shift,  and  Tsw  the  time  constant  of  the 
exponential  portion.  Conceptually  Tg£  is  the  time  until  the  output  begins  to  move  in  response  to  an 
input  transition,  and  Tsw  is  a  measure  of  how  quickly  the  output  switches  once  it  does  begin  to 
change.  We  curve  fit  actual  circuit  waveforms  to  the  time-shifted  ramps  with  exponential  tails.  From 

the  figure  we  see  that  the  output  waveform  of  the  chain  of  inverters  is  described  by 
* 

chain  T££ou  =  £  TBEl 
i*  l 

chain  TSWciit  =  TSWoutn 

We  characterize  output  loads  in  terms  of  an  effective  capacitance,  dividing  charge  transferred  by 

'Actual  circuit  waveforms  begin  more  smoothly  than  our  approximation.  However  the  error  is  negligible  because  the  logic 

gate  driven  by  the  waveform  does  not  really  begin  to  switch  until  the  waveform  until  the  waveform  reaches  V(  |  f  (the  point 

on  the  dc  transfer  curve  where  V.  »  V  )  and  is  tnerefore  insensitive  to  the  shape  of  the  first  part  of  the  wavefc33fr 

in  out 


Figu  re  1 :  Waveform  Characterization 

change  in  voltage.  This  allows  us  to  model  RC  interconnection  networks,  since  the  effective 
capacitance  can  be  a  function  of  waveform  slope. 

We  “black-box"  the  cell  as  shown  in  Figure  2.  The  cell  is  affected  by  its  environment  via  the 
boundary  conditions  TSWin  and  CL.  It  interacts  with  its  neighbors  via  its  interface  attributes  and 
Tgwoy,-  The  internal  attributes  power  and  T8Eout  are  isolated  from  the  environment  and  have  no 
influence  on  the  attributes  of  the  cell's  neighbors. 

3.  Inverters 

We  begin  our  macromodeling  analysis  with  the  ubiquitous  inverter,  illustrated  in  Figure  3.  The 
results  will  be  extended  to  more  general  gates  in  a  subsequent  section.  For  the  sake  of  conciseness, 
our  analysis  is  only  shown  for  rising  input,  falling  output  nMOS  gates.  The  macromodel  equations  for 
the  opposite  transition  and  for  CMOS  are  similar.  We  will  present  the  actual  macromodel  equations 
(for  both  transitions)  in  the  section  on  general  logic  gates. 
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Figure  2:  Macromodel  Representation 


Figu  re  3:  A  Depletion  Load  nMOS  Inverter 

3.1 .  Objective  Function 

The  practicing  engineer  typically  must  design  circuits  such  that  they  satisfy  delay  specifications. 
The  engineer  also  desires  to  minimize  some  objective  function  subject  to  those  delay  constraints. 
Power  dissipation  is  a  major  concern  in  ratioed  nMOS  technology.  We  accordingly  choose  to 
minimize  power  dissipation,  which  for  nMOS  is  dominated  by  static  power  consumption.  The  static 
power  consumed  by  an  nMOS  inverter  is  roughly  proportional  to  the  shape  factor  of  the  pullup;  that 
is, 
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Power  =  ax  4- 

where  a,  and  a2  are  constants  that  depend  on  the  fabrication  process  and  power  supply  voltage. 

The  choice  of  an  objective  function  for  CMOS  circuits  is  not  as  dear.  Typically  a  designer  wishes  to 
minimize  area,  power  dissipation,  or  some  combination  of  the  two.  Characterizing  the  area 
consumption  is  difficult  because  it  is  highly  dependent  on  layout  style.  However  we  can  easily 
describe  the  contribution  of  the  transistors.  This  is  simply 

* 

Area  s  Poly  Pitch  x  ^  stack  width  f 

im\ 

where  we  have  omitted  the  transistor  lengths  because  for  CMOS  they  are  set  to  the  minimum  channel 
length. 

3.2.  Output  Waveform 
3.2.1 .  Resistive  Model 

Computational  limitations  mandate  the  use  of  a  simple  delay  model.  The  simplest  transistor 
representation  that  provides  tolerable  accuracy  is  a  switched  resistor.  The  MOS  transistor  is  modeled 
with  a  capacitor  from  the  gate  to  ground  and  a  switched  linear  resistor  from  drain  to  source.  The  gate 
to  source  voltage  controls  whether  the  resistor  is  switched  on  or  off.  The  delay  characteristics  of  the 
model,  along  with  their  implications  for  circuit  optimization,  have  been  analyzed  in  [3]  and  [4].  The 
principal  advantage  of  the  model  is  its  simplicity,  which  allows  one  to  derive  closed  form  expressions 
for  the  optimal  transistor  sizes,  leading  to  fast  run  times.  Unfortunately  the  model  can  be  alarmingly 
inaccurate.  Moreover  the  errors  can  be  exacerbated  by  the  optimization.  For  a  chain  of  similar  gates 
where  the  capacitive  loading  on  each  stage  is  dominated  by  the  input  capacitance  of  the  next  stage 
(rather  than  the  wiring  capacitance),  pushing  the  chain  for  speed  results  in  equal  stage  delays.  For 
nMOS  this  virtually  guarantees  that  while  for  rising  outputs  the  stage  is  insensitive  to  input 
waveshape,  for  falling  outputs  the  stage  is  highly  sensitive  to  input  slope.  This  sensitivity  means  that 
the  transistor  cannot  be  accurately  modeled  as  a  resistor,  and  the  effect  on  total  chain  delay  is 
significant  because  the  stage  delays  are  equal.  Rising  output  stage  delays,  for  which  the  resistive 
model  tends  to  be  valid,  do  not  dominate  the  total  delay.  The  model  exhibits  errors  of  up  to  70%, 
clearly  unacceptable  for  serious  circuit  design. 
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3.2.2.  Extended  Model 

-  Faced  with  the  inability  of  the  resistive  model  to  account  for  waveshape  effects,  we  are  compelled  to 
derive  a  more  elaborate  model.  Ever  mindful  of  computation  time  limitations,  we  pursue  the  simplest 
possible  extensions  that  will  provide  the  needed  accuracy.  We  begin  by  studying  the  inverter's 
response  to  different  input  waveform  slopes,  paying  particular  attention  to  the  different  regions  of 
transistor  operation. 

As  the  inverter's  input  rises  and  its  output  falls,  the  pullup  and  pulldown  transistors  sequence 
through  different  regions  of  operation.  These  regions  are  summarized  in  Table  1.  For  the  fast  input 
response2  the  bulk  of  the  delay  accrues  from  the  last  states  where  the  pulldown  is  in  its  linear  region. 
Hence  the  pulldown  can  be  approximated  by  a  resistor,  and  the  resistive  model  works  well  here. 
However  for  slow  inputs  the  pulldown  is  saturated  for  a  significant  portion  of  the  transition,  causing 
the  inverter  to  behave  like  an  amplifier.  In  this  mode  the  inverter  is  highly  sensitive  to  the  input 
waveform  and  consequently  the  resistive  model  breaks  down. 


Fast  Input  Response 

pullup 

pulldown 

linear 

off 

linear 

sat 

linear 

linear 

sat 

linear 

Slow  Input  Response 

pullup 

pulldown 

linear 

off 

linear 

sat 

sat 

sat 

sat 

linear 

Table  1 :  Pullup/Pulldown  Regions  of  Operation 


We  seek  a  simple  model  that  includes  both  amplifier  and  resistor  behavior.  We  are  especially 
concerned  with  the  middle  and  latter  parts  of  the  input  transition,  for  it  is  here  that  the  inverter’s 
output  is  sinking  the  most  current.  The  beginning  of  the  transition  is  not  as  crucial.  For  slow  inputs 
the  inverter  can  be  modeled  as  an  amplifier  when  the  pulldown  is  saturated,  and  as  a  resistor  to 
when  the  pulldown  is  in  its  linear  region.  As  the  input  transition  becomes  faster,  the  inverter  spends 
proportionately  less  time  as  an  amplifier  and  more  as  a  resistor.  The  mapping  of  the  inverter  to  an 
amplifier  and  resistor  is  shown  in  Figure  4.  For  continuity  of  vQuT  and  iQuT  when  the  model  changes 
from  an  amplifier  to  a  resistor,  we  use  the  resistor  model  when  ic#p9  +  v^. 

The  ac  model  used  for  the  amplifier  behavior  clashes  with  traditional  engineering  philosophy. 


2. 


Fast"  means  that  the  input  transition  time  is  fast  relative  to  the  output  transition  time. 
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Amplifier 


Figure  4:  Delay  Mapping 

Normally  one  creates  an  ac  model  by  linearizing  a  circuit  about  a  quiescent  operating  point.  Here 
however  we  are  interested  in  large  signal  behavior.  Consequently,  while  we  can  perturb  the  system 
from  an  initial  point  (in  this  case  (v|N.  Vq,^)  »  (V|L,  V^)),  we  have  no  easy  method  to  calculate  the 
model's  parameters  such  as  gm  and  rL-  We  cannot  simply  evaluate  the  parameters  at  a  quiescent 
operating  point  because  we  have  none.  We  instead  view  the  problem  at  a  more  objective-oriented 
level,  and  seek  to  determine  which  values  of  gm,  rL,  etc.,  will  provide  the  closest  approximation  to 
observed  response  times.  Moreover,  rather  than  using  the  same  set  of  parameter  values  for  the  rising 
input  and  falling  input  responses  (which  would  correspond  to  using  a  single  group  of  drive  curves  to 
characterize  the  inverter),  we  desire  to  acquire  additional  accuracy  by  using  distinct  sets  for  each 
transition’s  begin  and  switch  responses.  This  leads  us  to  the  following  strategy:  analytically  derive 
expressions  for  the  form  of  the  macromodel  equations,  then  curve  fit  to  observed  data  to  solve  for  the 
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constants  in  the  equations. 

Closed  form  expressions  for  the  model's  response  to  different  input  waveforms  can  be  derived. 
Here  we  will  outline  the  basic  concepts;  the  actual  equations  will  be  given  at  the  end  of  section  4.  For 
very  fast  inputs  the  inverter  changes  from  an  amplifier  to  a  resistive  form  immediately.  In  other  words, 
the  first  order  resistive  model  is  valid.  Figure  4  shows  the  model.  For  fast  inputs  the  inverter  model 
does  not  change  immediately,  but  does  change  before  the  output  transition  completes.  We  use  the 
amplifier  model  of  Figure  4  but  omit  the  resistor  r^.  The  output  switches  quickly  enough  that  the 
current  in  the  total  capacitance  C,ota|  ( ■  CL  +  c^)  dominates  that  of  rL,  allowing  us  to  neglect  the 
resistor.  For  moderate  inputs  the  input  waveform  is  slow  enough  that  the  model  changes  after  the 
output  has  fallen.  Tne  current  in  C,  still  dominates  that  of  rL.  For  slow  inputs,  the  current  through 
rL  can  no  longer  be  neglected.  Unfortunately  this  leads  to  equations  which  cannot  be  solved  for 
closed  form  expressions  for  TQE  and  T^.  For  very  slow  inputs,  the  input  and  output  waveforms  have 
slowed  to  the  point  where  the  current  through  CtQta|  is  almost  negligible  compared  to  that  through  rL. 
The  amplifier  system  reaches  steady  state,  exhibiting  a  constant  tracking  error  to  the  ramp  input, 
being  entirely  limited  by  the  speed  of  the  input  [5]. 

Having  described  a  method  for  determining  the  inverter’s  response  to  various  input  slopes,  we  now 
seek  a  means  of  combining  the  results  into  one  conglomerate  expression.  It  is  common  to  use 
smoothing  functions  to  effect  this  combination.  However  many  workers  fail  to  consider  the 
computational  overhead  incurred  with  these  functions.  We  instead  create  simple  functions  that 
exhibit  the  desired  behavior  in  each  of  the  input  slope  regimes.  To  avoid  placing  any  unnecessary 
burdens  on  the  optimization  algorithms,  we  choose  functions  that  are  twice  continuously 
differentiable.  Although  optimization  algorithms  exist  for  solving  problems  with  ill-behaved  (eg. 
discontinuous)  functions,  because  of  their  added  generality  these  algorithms  tend  to  be  slower. 
Moreover  we  prefer  functions  that  do  not  contain  multiple  maxima  or  minima;  ie.  that  are  unimodal. 
This  helps  eliminate  cusps  that  could  trap  an  optimizer's  iterative  solution  technique.  The  resulting 
inverter  equations  are  fully  described  and  analyzed  in  [6]. 

3.3.  Input  Capacitance 

Calculating  a  gate's  delay  requires  knowledge  of  the  input  capacitances  of  the  gates  that  it  drives. 
In  this  section  we  study  an  inverter's  input  capacitance.  Our  results  will  be  extended  to  more  general 
logic  gates  in  a  later  section. 

We  begin  by  considering  the  components  of  the  input  capacitance.  Figure  5  shows  our  model.  The 
input  capacitance  has  two  constituents:  the  gate*to  drain  and  gate  to  source  transistor  capacitances. 
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Figu  re  5:  input  Capacitance  Model 


The  input  capacitance  presented  to  the  driver  can  change  during  the  course  of  the  input  transition. 
This  effect  is  largely  due  to  the  input  to  output  coupling  capacitance  c^.  Consider  a  rising  input 
transition  of  moderate  speed.  During  the  beginning  portion  of  the  input  waveform — that  is,  before  the 
input  voltage  has  reached  V,L— the  inverter’s  output  has  not  yet  moved  significantly.  The  input 
capacitance  is  therefore  simply  c^  ♦  c^.  Both  terms  are  proportional  to  the  pulldown  transistor's 
width. 

As  the  input  voltage  passes  V|L,  the  inverter  begins  to  pull  its  output  low.  Consequently  the  driver 
must  supply  more  current  to  charge  cgd  than  it  would  have  had  the  output  voltage  remained  fixed. 
This  is  called  the  Miller  effect  [7].  The  effective  input  capacitance  has  increased.  Note  that  the  total 
voltage  switch  across  cgs  is  always  •  V^,  while  that  across  cgd  is  2  (VQH  •  VQL),  but  we  are  only 
interested  in  the  capacitance  seen  during  the  beginning  and  switching  portions  of  the  input 
waveform.  For  very  fast  input  transitions  the  output  will  not  have  moved  until  after  the  input  has  fully 
switched;  hence  the  driver  will  not  have  seen  any  Miller  capacitance  during  the  actual  transition.  As 
we  slow  the  speed  of  the  input  transitions,  more  of  the  output's  switching  time  overlaps  with  the 
input's  and  we  see  more  Miller  capacitance.  Eventually  all  of  the  output's  switching  time  overlaps 
with  the  input's  and  Cswjn  reaches  a  plateau.  The  expected  behaviors  of  the  effective  input 
capacitances  CB£in  and  CSWin  appear  in  Figure  6. 

The  analysis  is  complicated  slightly  by  the  fact  that  since  c^  and  c^  are  functions  of  vQS  and  vGQ, 
they  not  only  vary  as  the  gate  switches,  but  their  average  value  during  the  input  transition  changes  as 
the  input  transition  slows  down.  The  outcome  of  this  change  in  cgd  is  that  for  rising  inputs  the  Miller 
effect  is  significant,  while  for  falling  inputs  the  Miller  effect  is  scarcely  noticeable. 


c 
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Figure  6:  Expected  Input  Capacitance 


4.  General  Logic  Gates 

Inverters  are  but  one  of  a  myriad  of  logic  gates  found  in  circuit  designs.  In  this  section  we  will 
extend  our  discussion  to  cover  a  more  general  class  of  logic  gates.  We  limit  our  analysis  to  logic 
gates  with  a  single  active  input,  as  shown  in  Figure  7.  Transitions  at  multiple  inputs  are  not  supported 
by  our  abstract  model;  accurate  evaluation  of  their  effects  requires  a  low  level  simulator  that 
computes  node  voltages  and  branch  currents.  We  feel  that  this  represents  an  excessive  computation 
cost  and  therefore  choose  a  worst  case  gate  state  with  a  single  active  input  to  model  multiple  input 
transitions. 


We  will  derive  macromodel  equations  for  the  general  logic  gate  by  extending  our  inverter  equations. 
As  regards  the  objective  function — be  it  power  or  area— the  equations  are  basically  unchanged.  The 
power  consumption  of  an  nMOS  gate  is  still  proportional  to  the  shape  factor  of  the  pullup,  and  the 
power  or  area  consumption  of  a  CMOS  gate  is  still  dependent  on  the  stack  widths.  However  the 
equations  for  the  output  waveform  and  input  capacitance  require  moderate  extensions. 


Figu re  7:  General  Logic  Gate 


4.1 .  Output  Waveform 

Additional  transistors  in  a  logic  gate  introduce  two  complications.  If  they  are  part  of  the  path  that 
switches  the  output  by  forming  a  path  to  VDD  or  ground,  their  resistance  and  capacitance  impedes  the 
output  transition.  If  they  are  included  in  a  side  path  that  does  not  connect  the  output  to  a  supply  rail, 
their  channel  capacitance  could  add  to  the  load  capacitance  and  hinder  the  output  transition.  During 
the  output  switching  transient,  transistors  with  high  inputs  are  predominantly  in  their  linear  region. 
Hence  we  model  them  as  RC  lines  formed  of  their  drain  to  source  resistance  and  channel  to  gate  and 
substrate  capacitance. 

Figure  8  contains  an  example.  Note  that  while  the  top  transistor  in  the  right  pulldown  branch  is  not 
in  the  conducting  path,  its  capacitance  adds  to  the  total  load.  The  general  situation  is  depicted  in 
Figure  9. 

The  additional  transistors  affect  the  gate's  response  in  two  ways.  For  fast  inputs,  the  switching 
transistor  can  still  be  modeled  as  a  resistor  with  its  c  and  c_,  but  a  closed  form  expression  for  the 


output  waveform  cannot  be  derived.  We  instead  find  the  approximate  response  by  using  an  approach 
first  proposed  by  Elmore  [8]  and  now  used  in  waveform  estimation  and  bounding  work  in  MOS  circuits 
[9, 10].  This  approximates  the  true  response  as  a  single  time  constant  exponential.  For  slower  inputs 
the  speed  of  the  output  transition  is  limited  by  the  slope  of  the  input  and  transconductance  of  the 
switching  transistor.  Consequently,  transistors  in  the  conducting  path  which  are  electrically  after  the 
switching  transistor  have  small  vQS  and  we  can  neglect  their  voltage  drops.  However  we  must  add 
their  capacitances  (along  with  those  of  any  transistors  connected  to  them)  to  the  total  load 
capacitance.  Transistors  which  are  before  the  switching  transistor  do  not  impose  any  additional  load 
since  their  capacitances  are  already  discharged;  nonetheless  their  resistance  will  decrease  the 
switching  transistor's  effective  gm  if  they  are  in  the  conducting  path,  impeding  the  output  transition. 
This  effect  is  illustrated  in  Figure  10.  The  effective  gm  has  been  reduced  to  gm  /  (1  ♦  gmrb). 

Combining  our  results,  we  obtain  the  following  equations  for  the  output  response  of  the  general 
nMOS  logic  gate.  For  each  equation,  the  macromodeling  support  package  finds  the  parameters  a,  bjt 
etc.  that  provide  the  closest  agreement  with  SPICE  data. 
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Figure  9:  Circuit  Model  for  a  General  Logic  Gate 


Figu  re  1 0:  Reduction  of  Effective  Transconductance 


Let  w ^  =  width  of  the  switching  device 

wpd total  =  e,,ective  total  w^  of  devices  in  the  conducting  path 

(treat  as  if  w^’s  were  conductances) 
wpdbt/ort  ~  effecl've  total  wpd  of  devices  before  the  switching 
transistor  in  the  signal  path 


pd  driver 
Cpdqfttr 


~  '  pddnvtr  Cpdqfltr  ^~L 
=  CXWpd  • 
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Delay  until  output  begins  to  fall 


TBEFout  ~  TBEFO  +  m  TSWRin 


TBEO  ~  ai  +  bi  w  C total 


m  =  i  +  i—e-  +  d.Jsa 


pd  total  pd 

Switching  time  of  faffing  output  transition 
TSWFout  =  TSWFO  +  m  TSWRn 

bl 

1 swfo  ~  fli  +  Elmore  delay  approximation  with  r ■  =  —  for  each  conducting  pulldown 

v 

m  =  di  +  di-f  +  C,o,ai (“* +  ^—L_) 
pd  pd  Wpdbt/ort 


Delay  until  output  begins  to  rise 


TBERout  ~  TBF.RO  +  m  TSWrm 


TBEO  =  fli  +  iic  C total 


m„d  +  dJs^  +  dSst! 
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Switching  time  of  rising  output  transition 

W - trnz-rJBS  -  -  +  (Tsw,0 + *  w 

(TSWRO  ~  ^sw/to*  +  mTSWfU 

rsiyjt0  »  tfj  +  Elmore  delay  approximation  with  r  *  —• 

> 

V  V 
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4.2.  Input  Capacitance 

C'  As  we  have  seen,  the  addition  of  extra  transistors  to  form  more  complicated  logic  functions  has  an 
effect  on  a  gate’s  output  response.  We  have  examined  the  effective  capacitance  of  an  nMOS  inverter 
during  the  beginning  and  switching  phases  of  the  rising  and  falling  input.  Of  these  four  modes,  only 
one  depended  on  the  output  waveform.  The  input  capacitances  during  the  beginning  portion  of  the 
©  input  transition  had  no  dependence  because  the  output  was  still  stationary.  The  capacitance  during 
the  switching  portion  of  a  falling  input  had  none  because  the  average  input  to  output  coupling 
capacitance  dropped  as  the  input  waveform  slowed  down,  leading  to  no  net  Miller  effect.  Hence  the 
input  capacitance  for  these  three  modes  depends  only  on  the  pulldown  transistor's  size,  being 
®  proportional  to  the  transistor’s  width  (assuming  a  fixed  channel  length).  Only  the  switching  portion  of 
the  rising  input  possesses  a  significant  output  waveform  dependence.  To  account  for  this 
dependence  we  must  analyze  the  conducting  path  containing  the  switching  transistor.  Figure  11 
shows  an  abstract  gate  model  along  with  its  circuit  level  representation.  We  model  'on'  transistors  as 

£ 

resistors  (linear  region  approximation)  and  have  added  the  appropriate  capacitances  from 
nonconducting  paths  to  the  total  load  capacitance. 


We  find  that  rb  causes  a  significant  drop  in  the  input  capacitance.  This  fact  has  been  exploited  for 
many  years  by  amplifier  designers  to  raise  input  impedance  and  thereby  improve  the  transfer 
characteristic.  For  very  fast  inputs,  the  contribution  of  c^  is  reduced  by  a  factor  (1  ♦  gm  rb).  For  very 
slow  inputs  node  a  will  have  dropped  to  by  the  completion  of  the  input  transition;  hence  the  input 
capacitance  is  identical  to  that  of  an  inverter.  For  moderate  inputs  the  input  capacitance  is  in  the 
transition  region  from  fast  to  slow  input  behavior.  Since  the  total  capacitance  increase  during  the 
transition  region  from  fast  to  slow  inputs  grows,  the  slope  of  the  transition  increases. 


Figure  1 1 :  Circuit  Model  for  Input  Capacitance 

Our  analysis  yields  the  following  equation  for  CSWRin: 

C  -c  +AC  { _ —sw™ _ ) 

'-SMm  <'SWROi»  +  mT  ' 

^SWRin  T  m,SWRin 

^ SWRm  *  ^SWRaam  ~  ^ SWROin 
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5.  Implementation 

•  We  have  developed  a  general  purpose  macromodeling  software  package.  The  modeler  takes  as 
input  cell  template  files,  a  macromodel  control  file,  and  macromodel  equations.  Each  cell  template 
file  contains  a  logic  cell's  general  topology.  The  macromodeler  inserts  values  for  device  sizes, 
capacitive  loads,  and  input  waveforms  into  the  template,  and  then  runs  SPICE  on  the  resulting  circuit 
The  values  of  the  input  capacitance  and  output  waveform  are  extracted  from  the  SPICE  output  and 
stored.  This  process  repeats  for  every  combination  of  device  sizes,  loads,  and  input  waveforms 
specified  in  the  control  file.  At  present  216  SPICE  runs  are  performed  for  the  general  logic  gate 
analyzed  in  this  section.  The  particular  logic  cells  used  are  inverters  and  NAND  gates.  Owing  to  the 

®  simplicity  of  the  cells,  the  SPICE  simulations  are  quite  fast,  each  requiring  about  ten  cpu  seconds  on  a 
DEC  20/60. 

i 

Once  the  data  points  have  been  obtained  the  macromodeler  solves  for  the  constants  in  the 

•  macromodel  equations  by  using  nonlinear  curve  fitting  algorithms.  We  minimize  the  sum  of  squared 
error;  minimizing  the  maximum  error  might  also  be  acceptable  but  it  is  too  sensitive  to  noise  in  the 
data.  The  curve  fitter  uses  a  Davidon-Fletcher-Powell  algorithm  [11]  with  modifications  to  accept 
upper  and  lower  bounds  on  the  parameters  [12].  This  is  essential  for  ensuring  that  the  final  equations 

t  make  physical  sense.  Otherwise  local  minima  in  the  error  function  could  draw  the  curve  fitter  toward 
nonphysical  values  for  the  constants.  Local  minima  in  the  error  function  also  mandate  that  higher 
order  effects  be  successively  included  in  the  model  equations.  That  is,  we  solve  for  the  first  order 
terms  in  the  equations  first  and  then  progressively  solve  for  higher  order  terms.  For  example,  when 

•  curve  fitting  the  macromodel  equation  for  output  switching  times,  we  first  start  with  the  simple  RC 
model.  We  select  a  subset  of  data  points  with  fast  inputs  and  large  capacitive  loads  —  those  points 
for  which  the  model  is  most  accurate  —  and  solve  for  the  RC  terms  in  the  equation.  We  lock  these 
parameters  and  then  solve  for  the  waveshape  terms.  Next  we  solve  for  self  capacitance  terms.  Finally 
we  unlock  alt  parameters  and  curve  fit  again.  This  technique  helps  to  ensure  that  we  reach  the  global 
minimum  of  the  error  function  and  adds  very  little  to  the  total  computation  time  because  the  time  is 
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dominated  by  the  SPICE  runs. 


The  modeler  is  written  in  a  computer  language  called  CLU  [13].  It  consists  of  SPICE  interface, 
minimization,  and  matrix  manipulation  program  modules.  These  modules  contain  3200,  1800,  and 
1000  lines  of  code,  respectively.  All  told,  the  modeling  support  routines  comprise  about  6000  lines  of 
CLU  code;  the  modeling  programs  specific  to  the  general  logic  gate  discussed  in  this  section 
represent  an  additional  1700  code  lines. 

Pertinent  curve  fit  statistics  are  shown  in  Table  2.  The  macromodel  equations  are  typically  within 
several  percent  of  the  SPICE  predictions,  a  major  improvement  over  RC  models.  These  benefits 
come  at  a  small  price  in  computational  overhead  because  we  have  modeled  the  response  of  the  entire 
cell,  rather  than  using  a  more  sophisticated  transistor  model  and  then  having  to  compute  the 
transistors'  interactions  to  obtain  the  cell's  response.  The  accuracy  and  computational  speed  of  the 
macromodels  make  them  well  suited  for  both  simulation  and  optimization  applications. 


Rising  Input,  Falling  Output 


Model  Eqn 

%  Error 

ave  max 

CBERin 

1.5 

5.6 

^SWRin 

3.7 

12.3 

JflEFout 

5.7 

18.3 

'  SWF out 

8.6 

27.8 

Falling  Input,  Rising  Output 


Model  Eqn 

%  Error 

ave  max 

^BEF.n 

1.5 

6.9 

sWFin 

1.3 

9.7 

TBERout 

4.6 

13.2 

TSWRout 

3.0 

10.6 

Table  2:  Macromodel  Curve  Fit  Accuracies 
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ABSTRACT 


Power  consumption  and  signal  delay  are  crucial  to  the  design  of  high 
performance  VLSI  circuits.  This  paper  presents  a  CAD  tool  for  optimizing 
digital  MOS  designs.  The  tool  determines  the  transistor  sizes  that  minimize 
circuit  power  consumption  subject  to  constraints  on  signal  path  delays. 
Computational  efficiency  is  obtained  through  macromodeling  techniques 
(described  in  a  companion  paper)  and  a  specialized  optimization  algorithm. 

The  macromodels  are  based  on  device  equations,  and  encapsulate  logic  gate 
behavior  in  a  set  of  simple  yet  accurate  formulas.  The  optimization  algorithm 
exploits  properties  of  the  digital  MOS  domain  to  convert  the  primal  optimiza¬ 
tion  problem  into  a  dual  form  which  is  much  easier  to  solve.  The  result  is  a 
CAD  tool  which  can  optimize  a  circuit  in  roughly  the  amount  of  time  needed  to 
perform  a  transistor  level  simulation  of  the  circuit. 


*This  work  was  supported  in  part  by  an  RCA  fellowship,  in  part  by  the 
Defense  Advanced  Research  Projects  Agency  of  the  Department  of  Defense  under 
Contract  No.  N00014-80-0622,  and  in  part  by  the  Air  Force  Office  of  Sponsored 
Research  under  Contract  No.  F49620-84-C-0004. 

**Department  of  Electrical  Engineering  and  Computer  Science,  M.I.T., 

Room  36-575,  Cambridge,  MA  02139;  (617)  253-8169. 

Copyright (c) 1984,  M.I.T.  Memos  in  this  series  are  for  use  inside  M.I.T. 
and  are  not  considered  to  be  published  merely  by  virtue  of  appearing  in  this 
series.  This  copy  is  for  private  circulation  only  and  may  not  be  further 
copied  or  distributed.  References  to  this  work  should  be  either  to  the 
published  version,  if  any,  or  in  the  form  "private  communication.”  For 
information  about  the  ideas  expressed  herein,  contact  the  author  directly. 

For  information  about  this  series,  contact  Microsystems  Program  Office, 

Room  36-575,  M.I.T.,  Cambridge,  MA  02139;  (617)  253-8138. 


MICROSYSTEMS  PROGRAM  OFFICE,  Room  J6-V5  Telephone  (61  ■»  i  JSH-8M8 


Optimization  of  Digital  MOS  VLSI  Circuits 


Abstract 

Power  consumption  and  signal  delay  are  crucial  to  the  design  of  high  performance  VLSI  circuits. 
This  paper  presents  a  CAD  tool  for  optimizing  digital  MOS  designs..  The  tool  determines  the  transistor 
sizes  that  minimize  circuit  power  consumption  subject  to  constraints  on  signal  path  delays. 
Computational  efficiency  is  obtained  through  macromodeling  techniques  (described  in  a  companion 
paper)  and  a  specialized  optimization  algorithm.  The  macromodels  are  based  on  device  equations, 
and  encapsulate  logic  gate  behavior  in  a  set  of  simple  yet  accurate  formulas.  The  optimization 
algorithm  exploits  properties  of  the  digital  MOS  domain  to  convert  the  primal  optimization  problem 
into  a  dual  form  which  is  much  easier  to  solve.  The  result  is  a  CAD  tool  which  can  optimize  a  circuit  in 
roughly  the  amount  of  time  needed  to  perform  a  transistor  level  simulation  of  the  circuit. 
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1 .  Introduction 

The  design  of  a  VLSI  circuit  is  an  enormous  task.  Sophisticated  CAD  tools  are  essential  if  designers 
t  are  to  take  full  advantage  of  the  power  offered  by  fabrication  technology.  We  describe  a  tool  for 
optimizing  the  performance  of  digital  MOS  circuits.  This  tool  finds  the  transistor  sizes  that  minimize 
power  consumption  subject  to  constraints  on  signal  path  delays.  The  principle  advantage  is  an 
increase  in  designer  productivity.  At  present,  designers  size  transistors  based  on  intuition  and 
(  numerous  SPICE  simulations.  This  process  is  so  time  consuming — for  both  man  and  machine— that 
designers  are  hard  pressed  to  arrive  at  any  circuit  that  meets  delay  specifications  and  can  rarely 
afford  the  extra  effort  needed  to  minimize  power  consumption  as  well.  This  not  only  hinders  the 
design  of  the  circuit  at  hand,  it  makes  it  difficult  to  compare  alternate  topologies  for  implementing 

>  functional  blocks,  as  the  performance  benefits  offered  by  different  topologies  cannot  be  truly 
ascertained  unless  the  corresponding  circuits  have  been  optimized. 

Another  application  is  automatic  module  generation  for  silicon  compilers.  The  module's  transistors 
t  must  be  properly  sized  in  order  to  meet  system  performance  specifications,  but  it  would  be 
unthinkable  to  have  a  human  perform  the  sizing.  The  task  could  involve  thousands  of  transistors, 
making  it  too  mundane  and  complicated.  A  special  purpose  optimizer  can  accomplish  the  chore  far 
more  efficiently. 

* 

Several  authors  have  studied  optimization  work  of  this  nature.  General  purpose  optimization 
packages  such  as  DELIGHT  [1]  and  APLSTAP[2]  perform  much  of  the  work  in  the  optimization 
process.  They  iteratively  improve  the  design  solution  as  a  designer  would,  but  by  employing 
)  nonlinear  optimization  algorithms,  choose  the  next  solution  point  more  accurately  and  efficiently  than 
a  human  could.  The  key  advantage  is  that  an  optimal  solution  is  reached.  However  the  optimization 
process  tends  to  be  computationally  expensive  for  a  number  of  reasons.  First,  since  the  optimization 
package  is  general  purpose  in  nature,  it  cannot  exploit  properties  of  digital  MOS  logic  and  use 
,  algorithms  which  would  be  more  problem  specific  and  hence  potentially  faster.  Second,  because  the 
optimization  package  is  isolated  from  the  circuit's  data  base,  communicating  solely  via  the  simulator, 
there  is  no  provision  to  embed  additional  information  in  the  data  base  which  could  assist  the 
optimization,  either  to  allow  one  to  access  information  more  readily  or  to  apply  a  more  efficient 

►  algorithm.  Third,  the  circuit’s  signal  path  delays  must  be  determined  fairly  accurately;  this  generally 
entails  the  use  of  a  device  level  simulator  such  as  SPICE,  which  is  rather  expensive  computationally. 
The  consequence  of  these  three  factors  is  that  general  purpose  optimizers  are  typically  restricted  to 
circuits  with  at  most  about  thirty  design  parameters. 

In  an  effort  to  address  larger  designs,  other  workers  have  investigated  more  specialized  techniques 
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[3].  By  using  a  resistive  model  for  transistors  and  neglecting  the  changes  in  a  logic  gate's  input 
capacitance  induced  by  sizing  its  transistors,  these  workers  were  able  to  greatly  simplify  the 
optimization  problem.  They  reformulated  the  original  problem,  a  minimization  subject  to  nonlinear 
constraints,  as  an  unconstrained  minimization.  This  allows  for  much  simpler  optimization  algorithms, 
leading  to  fast  convergence  times.  Nonetheless,  the  simplifications  needed  to  reformulate  the 
problem  seriously  reduce  the  accuracy  of  both  the  power  minimization  and  the  satisfaction  of  the 
delay  constraints,  making  the  approach  inappropriate  for  high  performance  circuit  designs. 

Other  authors  have  aimed  for  fast  computation  times  by  simplifying  both  the  logic  gate  models  and 
the  optimization  techniques.  Examples  are  TV  [4]  and  Andy  [5].  These  tools  use  resistor  models  for 
transistors  instead  of  the  computationally  expensive  device  level  models.  Heuristics,  rather  than 
nonlinear  optimization  algorithms,  guide  the  sizing  of  transistors  in  critical  paths.  In  particular,  TV 
speeds  up  paths  by  widening  the  transistors  of  slow  logic  gates,  while  Andy  uses  a  fixed  sizing  ratio 
from  gate  to  gate  when  a  chain  drives  a  large  capacitive  load.  Although  these  approaches  are 
computationally  fast  enough  to  be  applied  to  large  circuits,  our  problem  domain  requires  more 
accuracy  and  efficiency.  The  resistor  model  is  not  accurate  enough  for  high  performance  design, 
and  iteratively  applying  heuristics  is  not  as  efficient  as  nonlinear  optimization  algorithms  that 
simultaneously  consider  all  critical  paths. 

2.  Overview  of  Paper 

This  paper  presents  a  novel  approach  to  the  transistor  sizing  problem.  We  attack  the  competing 
needs  for  accuracy,  computational  speed,  and  a  nearly  optimal  solution  by  combining  the  benefits  of 
the  previous  approaches  we  examined.  Like  TV  and  Andy,  we  work  at  a  higher  level  of  abstraction 
than  SPICE,  transcending  the  details  of  actual  transistor  operation.  However  we  acquire  additional 
computational  speed  by  modeling  entire  logic  cells  rather  than  just  individual  transistors.  Like  the 
general  purpose  optimizers,  we  employ  nonlinear  optimization  techniques.  This  helps  to  assure  that 
we  reach  an  optimal  solution  in  an  efficient  fashion.  We  exploit  properties  of  digital  MOS  circuits  and 
apply  a  specialized  algorithm  to  the  problem,  yielding  striking  improvements  in  computational  speed. 

Section  3  outlines  the  special  features  of  the  optimization  problem,  describing  the  properties  of  the 
objective  and  constraint  functions.  Section  4  presents  the  theory  of  the  optimization  algorithms.  We 
choose  a  method  particularly  suited  to  our  problem,  taking  advantage  of  the  properties  of  the  digital 
MOS  domain,  and  of  our  ability  to  create  a  circuit  data  base  customized  for  the  transistor  sizing 
problem.  Our  approach,  called  duality,  allows  us  to  partition  the  problem  into  many  simpler,  smaller 
subproblems,  and  to  transform  the  nonlinear  delay  constraints  into  box  constraints  (eg.  constraints  of 
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the  form  x  >  0).  Section  S  discusses  the  implementation  of  the  optimizer.  We  describe  the 
organization  of  the  software  and  study  the  optimizer's  performance  on  some  example  circuits. 

• 

3.  Properties  of  Our  Problem 

We  begin  by  choosing  an  optimization  technique  which  is  appropriate  to  our  problem.  Selection  of 
the  technique  is  highly  problem  dependent,  as  "appropriateness"  in  nonlinear  optimization  is  nearly 
®  synonymous  with  fast  computation  times,  requiring  that  the  optimization  technique  be  closely 
matched  with  the  problem's  characteristics.  We  therefore  commence  by  considering  the  properties 
of  our  optimization  problem. 

®  We  desire  to  minimize  a  circuit's  power  consumption  subject  to  constraints  on  signal  path  delays 
and  transistor  sizes.  The  objective  function,  power,  is  the  linear  sum  of  the  power  consumptions  of 
each  circuit  cell.  For  nMOS  the  power  consumption  of  each  cell  is  linear  in  the  shape  factor  of  the 

pullup  transistor.  For  CMOS  the  power  consumption  is  linear  in  the  capacitive  loads  which  must  be 

C*' 

driven,  which  are  due  to  the  area  of  the  transistor  gates  and  interconnect  capacitance,  but  also 
depends  somewhat  on  the  input  waveforms.  Hence  for  nMOS,  and  nearly  for  CMOS,  the  power 
consumption  of  a  circuit  is  a  separable  function  of  the  form 

P  P  =  y> 

r  total  L*ri 
1=1 

where  P.  is  a  function  of  cell  i  only. 

O  The  problem's  constraints  are  of  two  varieties:  delay  specifications  and  transistor  size  design  rules. 
The  delay  along  a  signal  path  is  a  nonlinear  function  of  the  circuit's  transistor  sizes,  and  is  nonlocal, 
being  composed  of  contributions  from  each  cell  along  the  signal  path.  Fortunately  transistor  sizing  is 
very  nearly  a  separable  operation,  because  both  waveshape  and  capacitive  loading  effects  die  off 
C  rapidly  with  electrical  distance.  Consider  an  inverter  chain.  Whether  the  input  signal  is  slow  or  fast, 
by  the  time  the  waveform  has  propagated  to  the  chain's  output  its  shape  will  be  predominantly 
determined  by  the  last  gate  in  the  chain.  Fast  inputs  put  the  gate  in  an  RC  response  mode  where  the 
output  waveform’s  switching  time  is  governed  by  the  gate's  effective  output  resistance  and  capacitive 
1  load.  Slow  inputs  place  the  gate  in  a  gain  limited  mode  where  the  gate's  gain  increases  the  sharpness 
of  the  waveform's  transition.  Thus  a  gate  behaves  as  a  crude  wave  shaper. 

Capacitive  loading  effects  also  attenuate  quickly  with  electrical  distance.  Suppose  the  chain  is 
(■  driving  a  large  capacitive  load.  The  last  gate  will  have  to  be  fairly  wide  in  order  to  drive  the  load.  The 
second  to  last  gate  will  in  turn  have  to  be  somewhat  large  to  drive  the  wide  pulldown  transistor  of  the 
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last  gate.  We  need  progressively  less  widening  as  we  work  our  way  backwards  from  the  load.  Within 
a  few  gates  we  reach  a  point  where  we  are  fully  shielded  from  the  size  of  the  load. 

Design  rules  restrict  the  minimum  transistor  size.  For  nMOS  there  is  also  a  minimum  beta  ratio  (ratio 
of  pulldown  to  pullup  shape  factors)  requirement.  The  former  is  a  box  constraint;  the  latter  is  a  linear 
constraint.  The  constraints  on  a  circuit's  transistors  are  entirely  local  to  each  cell,  and  are  therefore 
separable. 

Accuracy  requirements  are  also  important,  and  exhibit  a  peculiar  ambivalence  in  our  problem.  The 
delay  specifications  on  the  signal  paths  must  be  met  to  the  full  accuracy  afforded  by  the 
macromodels.  However  the  power  minimization  is  less  critical.  We  can  tolerate  a  fair  amount  of  error 
in  minimizing  the  circuit's  power  consumption,  especially  if  the  inaccuracies  are  minor  and  are 
accompanied  by  large  savings  in  computation  time.  In  fact,  at  present  designers  use  only  crude 
heuristics  or  mostly  ignore  the  power  consumption  issue. 

In  summary,  the  problem  embraces  characteristics  ranging  from  the  trivial  to  the  extremely  difficult. 
The  objective  function  is  a  simple  summation  of  contributions  from  each  logic  cell,  each  contribution 
being  linear  in  the  cell's  transistor  sizes.  On  the  other  hand,  we  anticipate  hardship  with  the  delay 
constraints,  since  they  are  global  and  nonlinear.  Fortunately  there  are  only  a  few  of  them;  typically  a 
designer  will  specify  delays  for  only  about  ten  critical  paths  through  a  functional  block.  In  contrast, 
the  transistor  size  constraints  are  quite  simple,  consisting  of  linear  and  box  constraints.  However 
there  are  a  large  number  of  them,  at  least  one  for  every  transistor  in  the  circuit,  carrying  the  potential 
for  huge  run  times.  The  objective  and  constraint  functions  are  essentially  separable,  linearly 
composed  of  nearly  independent  contributions  from  the  circuit's  cells.  We  would  prefer  a  nonlinear 
optimization  algorithm  that  can  exploit  this  separability,  pursuing  a  divide  and  conquer  strategy  where 
the  problem  is  partitioned  into  many  smaller  subproblems.  This  segmentation  is  beneficial  because 
with  most  optimization  algorithms,  run  times  grow  superlinearly  with  the  number  of  design  variables. 
Thus  by  breaking  up  a  large  problem,  faster  run  times  can  be  achieved.  In  particular,  if  the  problem 
could  be  partitioned  down  to  the  cell  level,  the  size  of  the  vector  space  for  each  subproblem  would  be 
the  number  of  transistors  in  each  cell.  Small  vector  spaces  usually  imply  fast  run  times. 

4.  Duality 

We  carefully  studied  several  optimization  techniques,  including  feasible  directions  and  penalty 
methods.  Neither  of  these  methods  is  capable  of  exploiting  the  near  separability  of  the  problem. 
Since  we  felt  that  partitioning  was  vital  to  achieving  fast  run  times,  we  chose  a  technique  called 
duality,  a  fairly  exotic  approach  in  comparison  to  the  other  two  methods.  We  shall  see  that  the 
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ABSTRACT 

We  study  the  graph- theoretic  problem  of  embedding  a  graph 
in  a  book  'with  its  vertices  in  a  line  along  the  spine  of  the  book  and 
its  edges  on  the  pages  in  such  a  way  that  edges  residing  on  the 
same  page  do  not  cross.  This  problem  abstracts  layout  problems 
arising  in  the  routing  of  multilayer  printed  circuit  boards  and  in 
the  design  of  fault- tolerant  processor  arrays.  In  devising  an 
embedding,  one  strives  to  minimize  both  the  number  of  pages  used 
and  the  "cutwidth"  of  the  edges  on  each  page.  Our  main  results 

(1)  present  optimal  embeddings  of  a  variety  of  families  of  graphs: 

(2)  exhibit  situations  where  one  can  achieve  small  pagenumber 
only  at  the  expense  of  large  cutwidth.  and  conversely;  and  (3) 
establish  bounds  on  the  minimum  pagenumber  of  a  graph  based 
on  various  structural  properties  of  the  graph.  Notable  in  this  last 
category  are  proofs  that  (a)  every  n-vertex  vaience-d  graph  can 
be  embedded  using  0(dn1/3)  pages,  and  (b)  for  every  valence  d>2. 
for  all  large  n.  there  are  n-vertex  valenee-d  graphs  whose 
pagenumber  is  at  least 
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computational  efficiency  that  it  affords  more  than  compensates  for  its  conceptual  complexity.  The 
basic  idea  of  duality  is  to  form  a  so-called  dual  problem  which  can  be  significantly  easier  to  solve  than 
the  original,  or  primal,  problem.  In  our  case  the  primal  is  difficult  to  solve  because  of  the  global, 
nonlinear  delay  constraints  and  large  number  of  transistors  to  size. 

Duality  offers  several  major  advantages.  First,  the  primal  problem  need  not  be  feasible.  This  is  a 
strong  possibility  because  high  performance  designs  often  push  circuit  topologies  to  the  limits  of  their 
performance.  It  is  likely  that  a  designer  will  specify  delays  for  some  signal  paths  which  cannot  be  met 
In  this  event  we  desire  that  our  CAD  tool  do  its  best  to  meet  those  speed  specifications  while 
optimizing  the  power  consumption  of  the  other  paths  whose  delay  constraints  can  be  met.  Duality 
achieves  this  goal.  Second,  inactive  constraints  pose  no  difficulty  for  duality.  A  designer  specifies 
maximum  delays  along  signal  paths.  Due  to  paths  sharing  common  portions,  it  is  possible  that  one 
path’s  delay  specification  will  be  exactly  met  while  a  companion  path  will  be  faster  than  required,  and 
yet  this  situation  minimizes  power  consumption.  This  is  essentially  a  recasting  of  the  critical  path 
problem;  the  first  path  is  one  of  the  circuit's  critical  paths.  Third,  and  perhaps  most  importantly, 
duality  can  be  extremely  efficient  computationally.  This  is  due  to  two  factors.  Duality  converts  the 
nonlinear  delay  constraints  into  simple  box  constraints,  allowing  us  to  apply  fairly  simple  optimization 
algorithms  (which  implies  robustness  as  well)  with  quasi-Newton  methods.  The  quasi-Newton 
methods  lead  to  fast  convergence.  Also,  the  dual  approach  permits  us  to  exploit  the  separability  of 
the  power  and  delay  functions,  enabling  us  to  use  a  divide  and  conquer  strategy  where  each  cell  is 
optimized  separately.  Partitioning  affords  significant  computation  speed  advantages. 

Like  any  nonlinear  optimization  approach,  the  advantages  are  balanced  by  drawbacks.  Duality  is 
not  applicable  to  all  problems;  it  works  best  for  those  satisfying  a  certain  convexity  requirement,  a 
property  which  digital  MOS  circuits  possess.  Another  drawback  is  due  to  our  partitioning  approach 
rather  than  duality  itself.  Although  exploiting  separability  provides  run  time  improvements,  it 
necessitates  the  maintenance  of  additional  data  in  the  circuit's  data  base,  along  with  a  close 
interaction  between  the  control  structure  and  the  data  base.  Partitioning  the  circuit  into  cells  implies 
incremental  optimization  of  each  cell  in  succession.  This  mandates  a  sophisticated  data  base,  and 
places  profound  requirements  on  the  programming  language  used  to  implement  the  optimizer. 

4.1 .  Lagrange  Multipliers 

Lagrange  multipliers  are  the  key  to  understanding  duality.  We  shall  explain  their  use  and 
significance  through  a  simple  example.  Consider  a  chain  of  two  inverters.  The  input  is  driven  by  a 
source  v|N  through  a  resistor  Rs;  the  output  connects  to  a  load  capacitance  CL.  We  wish  to  constrain 
the  maximum  delay  of  the  chain.  If  we  fix  the  width  of  the  pullup  transistor,  the  length  of  the  pulldown, 
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and  the  beta  ratio  of  the  inverters,  then  we  can  treat  the  power  consumptions  of  the  inverters  as  the 
only  free  variables  because  specifying  the  power  consumption  of  either  logic  gate  determines  the 
gate's  transistor  sizes.  Let  p1  be  the  power  consumed  by  the  first  gate  and  p2  be  that  consumed  by 
the  second,  and  suppose  we  desire  the  total  delay  TtoU|  ■  T,  ♦  T2  to  be  less  than  or  equal  to  some 
T*. 

This  maximum  delay  specification  places  restrictions  on  the  allowable  power  consumptions  of  the 
gates.  Certain  regions  in  (pr  p2)  space  will  not  meet  the  speed  specification.  For  instance  if  the 
shape  factors  of  the  transistors  in  the  second  inverter  are  too  small,  the  inverter  will  not  be  able  to 
charge  the  capacitor  CL  quickly  enough  to  satisfy  the  delay  constraint.  On  the  other  hand,  if  the 
shape  factors  are  too  large,  implying  a  wide  pulldown  and  hence  a  large  input  capacitance,  the  first 
inverter  will  not  be  able  to  drive  the  second  quickly  enough.  Of  course  the  first  inverter’s  shape 
factors  can  be  made  larger  to  drive  the  extra  load,  but  after  a  certain  point  the  first  inverter's  input 
capacitance  becomes  so  large  that  the  delay  through  R$  precludes  meeting  the  delay  specification. 
Since  power  consumption  is  linearly  related  to  shape  factor,  the  bounds  on  the  shape  factors  imply 
bounds  on  the  power  consumption.  Similar  reasoning  applies  to  the  power  consumption  of  the  first 
inverter,  giving  us  the  forbidden  zones  (dashed  lines)  shown  in  Figure  1. 

We  can  more  precisely  characterize  the  feasible  set  of  power  consumptions.  We  do  this  by 
employing  a  simple  RC  model  for  the  inverters,  allowing  us  to  derive  an  analytic  expression  for  the 
delay  through  the  gates  as  a  function  of  their  power  consumptions.  The  resulting  constraint  surface 
Ttotai  *  T1  «•  T2  »  T*  is  elliptical  as  depicted  in  the  figure. 

Figure  1  also  shows  the  correlation  between  total  power  and  path  delay.  The  dotted  lines  are 
contours  of  constant  power.  To  meet  the  delay  constraint  we  must  stay  within  the  circular  region,  but 
the  total  power  dissipation  varies  with  position  in  the  region.  As  we  move  toward  the  upper  right  of 
the  feasible  set,  the  power  dissipation  increases.  At  the  point  Max  we  have  reached  the  maximum 
power  consumption  that  will  still  allow  us  to  satisfy  the  delay  constraint.  Here  the  delay  and  power 
contours  are  tangent,  and  their  gradients  point  in  the  same  direction.  If  we  instead  work  our  way 
toward  the  lower  left  of  the  feasible  set,  the  total  power  dissipation  decreases.  When  we  reach  the 
point  Min  the  dissipation  will  be  at  its  lowest  level  that  will  still  satisfy  the  delay  constraint.  Here  the 
delay  and  power  contours  are  again  tangent,  but  now  their  gradients  point  in  opposite  directions; 
mathematically  this  can  be  expressed  as 
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VP=  -  jiVT 
or  V  (P  +  jiT)  =  0 
where  /i>0 

The  variable  n  is  called  a  Lagrange  multiplier,  and  offers  the  key  to  solving  our  nonlinear  optimization 
problem. 
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Figure  1 :  Contours  of  Delay  and  Power 

4.2.  Finding  the  Optimum 

We  can  acquire  an  understanding  of  how  to  find  the  optimum  by  applying  a  graphical  approach  to 
the  inverter  chain.  We  are  interested  in  the  possible  total  power  and  total  delay  combinations  that  the 
circuit  can  exhibit.  In  other  words,  we  desire  the  locus  of  points  (Ttotal.  P,otal)  that  will  be  generated  if 
we  substitute  ail  valid  transistor  size  combinations  into  the  circuit.  This  locus  of  points  is  denoted  the 
set  of  all  possible  pairs,  9,  and  is  displayed  in  Figure  2.  The  set's  lower  left  boundary  is  the  classic 
power-delay  tradeoff  curve  (bold  line);  it  represents  designs  that  offer  propagation  delays  with  the 
lowest  possible  power  consumption  for  those  delays.  Points  toward  the  left  of  the  curve  are  in  the 
high  speed,  high  power  region.  As  we  move  down  the  curve  to  the  right,  we  trade  off  speed  for 


Figure  2:  Set  of  Possible  Points 

reduced  power  consumption,  and  eventually  enter  the  low  power,  low  speed  region. 

Points  that  are  not  on  the  tradeoff  curve  correspond  to  nonoptimal  circuits.  These  circuits  either 
consume  more  power  than  an  optimal  circuit  with  the  same  delay,  or  are  slower  than  an  optimal 
circuit  with  the  same  power  consumption.  For  example,  suppose  the  inverter  chain  is  driving  a  large 
capacitive  load.  We  should  make  the  second  inverter's  shape  factors  relatively  large  in  order  to  drive 
the  load,  and  then  make  the  first  inverter  slightly  large  to  drive  the  wide  pulldown  of  the  second 
inverter.  If  we  reverse  the  ordering,  making  the  first  inverter  very  large  rather  than  the  second,  the 
circuit  will  still  consume  the  same  amount  of  power  as  the  optimal  one,  but  will  be  considerably 
slower. 

Our  delay  specification  restricts  the  points  that  we  can  accept  to  those  having  a  total  delay  less  than 
or  equal  to  T*.  We  can  focus  our  attention  on  this  subset  by  shifting  the  vertical  axis  as  shown  in 
Figure  3.  Points  to  the  left  of  the  axis  have  delays  which  are  faster  than  T*;  this  subset  is  called  the 
feasible  region.  The  optimum  is  the  point  in  this  region  with  the  lowest  power  consumption,  and  is 
located  at  (0,  P*)  in  the  figure. 


We  must  somehow  reach  this  optimum  point  starting  from  an  arbitrary  point  in  the  set  9.  The 
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approach  duality  takes  can  be  thought  as  a  two  step  process,  illustrated  in  Figure  3.  The  first  step  is 
to  move  to  and  remain  on  the  lower  boundary  of  9  The  second  is  to  walk  along  this  boundary  to  the 
optimum.  Note  that  while  conceptually  this  process  may  be  interpreted  as  two  steps,  it  must  be 
implemented  as  an  inner  loop  embedded  in  an  outer  loop.  Step  one  corresponds  to  the  inner  loop, 
and  step  two  to  the  outer.  This  forces  the  search  to  follow  along  the  lower  boundary  of  the  set. 
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Figure  3:  Reaching  the  Optimum 


We  will  now  describe  the  implementation  of  each  loop.  Figure  4  gives  a  graphical  representation  of 
the  inner  loop.  Suppose  that  we  begin  at  some  arbitrary  assignment  x  of  transistor  sizes,  with  some 
arbitrary  nonnegative  Lagrange  multiplier  vector  ji.  The  transistor  sizes  x  map  to  point  (g(x),  P(x))  in 
g-P  space.  We  can  move  from  this  point  to  the  lower  boundary  of  the  achievable  set  by  sliding  the 
solid  line  down  until  it  is  tangent  to  the  bottom  of  9,  while  preserving  the  slope  of  the  line.  By 
geometry  we  know  that  a  line  through  a  point  (g(x),  P(x))  with  normal  (/i,  1)  intersects  the  vertical  axis 
at  P(x)  ♦  /ig(x).  The  multiplier  ji  fixes  the  slope  of  the  line.  Hence  this  sliding  operation  is  equivalent 
to  bringing  the  vertical  intercept  down  while  holding  /i  fixed.  We  must  perform  the  minimization 

min  {P(x)  + j*s(x)} 

subject  to  x  e  95. ,  the  set  of  valid  transistor  sizes 
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Figu  re  4:  Inner  Loop  Minimization 
We  shall  denote  this  intercept  as  ?(/*),  the  dual  functional. 

The  outer  loop  walks  along  the  lower  boundary  toward  the  optimum.  We  can  gain  insight  into  how 
this  might  be  accomplished  by  contemplating  the  effect  of  different  Lagrange  multipliers  on  the  inner 
loop's  minimization.  Figure  5  provides  an  illustration.  We  see  that  as  we  move  toward  the  optimum 
point  (0,  P*)  the  intercepts  increase  in  value  until  they  reach  P*.  Conversely,  if  we  move  away 
from  the  optimum  in  either  direction,  the  intercepts  decrease.  This  is  a  maximization: 

Pm  =  max  <p(jx) 
subject  to  n  £ 0 

This  maximization  gives  us  the  Lagrange  multiplier  p  of  the  optimum,  while  the  inner  loop 
minimization  provides  the  optimal  transistor  size  assignments. 

We  can  now  grasp  the  intuitive  significance  of  the  Lagrange  multiplier.  From  Figure  5  it  is  apparent 
that  as  n  increases,  the  line  becomes  more  vertical,  and  we  move  up  and  toward  the  left.  Power 
consumption  increases  whereas  delay  decreases.  We  are  generating  transistor  size  assignments  that 
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total  !  total 

Figure  5:  Outer  Loop  Maximization 

push  the  circuit  topology  harder  for  speed.  The  fact  that  the  multiplier  has  a  concrete,  practical 
meaning  is  quite  important,  because  it  allows  a  designer  to  follow  our  CAO  tool’s  "intent"  as  it 
optimizes  a  circuit,  showing  the  signal  paths  that  are  the  most  troublesome  in  meeting  the  delay 
specifications.  This  knowledge  is  vital  for  directing  efforts  to  improve  the  circuit,  such  as  reduction  of 
interconnect  capacitance  and  modification  of  circuit  topologies  (eg.  the  insertion  of  super  buffers). 

4.3.  Degenerate  Cases 

It  is  crucial  that  optimization  algorithms  perform  properly  even  when  faced  with  certain  degenerate 
conditions  in  the  delay  constraints,  such  as  inactive  or  infeasible  constraints.  Inactive  constraints  can 
come  from  one  of  two  sources:  (1)  a  delay  specification  on  a  signal  path  that  is  so  loose  that 
minimum  size  transistors  along  the  path  will  satisfy  it,  or  (2)  interactions  among  paths  give  rise  to  a 
situation  where  meeting  one  path's  constraint  causes  another's  to  be  inactive.  Of  these  two 
possibilities,  the  second  is  the  most  likely,  and  occurs  frequently  in  practice.  An  inactive  constraint 
arises  because  the  point  of  minimum  power  lies  to  the  left  of  the  constraint’s  vertical  axis  in  power- 
constraint  space;  the  dual  algorithm  will  converge  to  the  optimum  by  driving  the  constraint's 
Lagrange  multiplier  to  zero. 
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Another  important  degenerate  condition  comes  from  infeasible  delay  constraints,  where  the 
designer  requests  a  maximum  signal  path  delay  that  cannot  possibly  be  met.  These  cases  occur 
frequently  in  high  performance  circuit  design,  as  the  designer  pushes  a  circuit  topology  and 
fabrication  process  to  the  limits  of  their  performance.  Under  these  situations  we  desire  that  the 
optimizer  do  the  best  that  it  can,  sizing  those  paths  with  infeasible  constraints  such  that  they  switch 
as  fast  as  possible,  and  sizing  paths  with  feasible  constraints  such  that  their  power  consumption  is 
optimized.  The  dual  algorithm  will  drive  the  former  paths'  Lagrange  multipliers  towards  infinity,  sizing 
the  transistors  for  maximum  speed.  Thus  the  algorithm  gives  useful  feedback  to  the  designer, 
indicating  the  maximum  speed  the  circuit  topology  can  provide. 

4.4.  Restrictions 

As  we  mentioned  at  the  beginning  of  our  discussion,  although  duality  does  offer  significant 
advantages  over  other  optimization  methods,  it  is  limited  in  the  scope  of  objective  and  constraint 
functions  that  it  can  solve.  In  particular,  certain  objective  and  constraint  functions  can  produce  a 
condition  known  as  a  duality  gap.  These  functions  give  rise  to  nonconvexities  in  the  lower  left 
boundary  of  the  set  9,  leading  to  a  gap  between  the  solution  found  by  the  dual  algorithm  and  the  true 
optimum  P*.  We  have  never  encountered  a  duality  gap  for  any  of  the  circuits  we  have  optimized.  The 
power  and  delay  equations  describing  digital  MOS  gates,  and  the  separability  inherent  in  the  digital 
MOS  domain,  make  the  occurrence  of  a  gap  unlikely  and  imply  a  small  gap  even  if  one  should  appear. 
If  a  gap  ever  occurs  the  circuit  will  still  meet  delay  specifications,  with  a  bounded  amount  of  excess 
power  dissipation. 

5.  Implementation 

We  have  seen  that  duality  maps  the  nonlinear  delay  constraints  into  simple  box  constraints  on 
Lagrange  multipliers.  This  mapping  leads  to  simple,  computationally  efficient  control  structures.  The 
outer  loop  maximization  uses  a  Davidon-Fletcher-Powell  quasi-Newton  method  [6]  with  modifications 
for  the  box  constraints  [7].  The  inner  loop  minimization  is  more  complicated  since  it  must  handle 
linear  as  well  as  box  constraints;  it  uses  an  algorithm  due  to  Bard  [8].  Since  both  loops  work  in  small 
vector  spaces  and  use  second  derivative  information,  the  optimizer  runs  very  fast 

The  language  chosen  to  implement  the  optimizer  embodies  many  of  the  principles  of  data 
abstraction  and  object  oriented  programming.  These  features  were  essential  owing  to  our 
incremental  optimization  approach  and  the  hierarchical  nature  of  VLSI  design.  We  needed  a 
language  that  supported  automatic  dynamic  data  structure  allocation,  abstract  data  types,  implicit 
pointers,  and  recursive  procedure  calls  and  data  structures.  We  chose  the  CLU  programming 
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language  [9],  which  runs  on  DEC  20's  and  VAX’s.  The  language  system  has  extensive  compile  time 
type  checking  and  an  outstanding  interactive  debugger.  Both  greatly  facilitate  program  development. 
Another  good  choice  would  have  been  Zetalisp  on  a  Symbolics  3600.  The  optimizer  consists  of 
generic  nonlinear  minimization  and  maximization  routines,  a  circuit  optimization  support  package, 
and  routines  for  optimizing  generic  nMOS  logic  gates.  These  program  modules  represent  3500, 7400, 
and  2700  lines  of  CLU  code,  respectively. 


We  have  applied  our  optimizer  to  many  circuits;  here  we  present  two  representative  cases.  Our  first 
example  is  a  chain  of  three  inverters.  (The  circuit  is  simple  in  order  to  allow  a  comparison  with 
DELIGHT.)  We  began  with  minimum  size  transistors  and  requested  maximum  rise  and  fall  delays  of 
8.0  ns.  Optimization  statistics  appear  in  Table  1.  In  the  table,  TBEout  is  the  time  until  the  output  begins 
to  move  in  response  to  an  input  transition  and  TSVVout  is  a  measure  of  how  quickly  the  output  switches 
once  it  does  begin  to  change.  The  optimizer  reached  a  solution  in  slightly  over  15  cpu  seconds  on  a 
DEC  System  20/60. 
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Optimization  Accuracy: 


cpu  time  [seel 


optimizer 

sef  up 

optimization 

power 

DELIGHT  (VAX  750) 

133.7 

3018.0 

2.02 

Present  Work  (DEC  20/60) 

1.1 

15.2 

2.08 

Delay  Accuracies: 

predicted  [ns] 
Patf1  Taeour  7SWout 

SPICE  [ns] 

TBBour  7 smut  error  ^ 

fi  [mW/nsJ 

in  -»  out,  rise  4.06,  3.85 

3.80,  3.76 

+  5 

0.403 

in  — *  out,  fall  5.23,  0.64 

5.53,  0.77 

+  7 

0.000 

Total  SPICE  verification  time  (DEC  20/60  running  FORTRAN):  16.5  cpu  sec 
Table  1 :  Optimization  Statistics  for  the  Inverter  Chain 


An  attempt  was  made  to  run  DELIGHT  on  the  inverter  chain  and  compare  its  results  to  those  of  our 
optimizer,  but  the  effort  met  with  only  partial  success.  The  complex  interactions  among  objective  and 
constraint  functions  overwhelmed  DELIGHT’S  direction  finding  routine,  causing  the  program  to  hang 
up  in  infinite  loops.1  This  illustrates  how  general  purpose  optimization  algorithms  can  fail  when  faced 


1  gin  Ny e.  the  author  ot  DELIGHT,  believes  ;hat  the  problem  lies  m  the  direction  tinder's  Quadratic  programming  subroutine. 
He  s  investigating  more  robust  routines. 
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with  a  problem  of  this  nature;  special  purpose  algorithms  are  essential.  To  assist  the  direction  finder, 
we  eliminated  the  maximum  beta  ratio  and  minimum  shape  factor  constraints,  and  started  DELIGHT  at 
an  initial  set  of  transistor  sizes  that  was  fairly  close  to  the  optimal  solution.  The  problem  was  also 
simplified  by  not  evaluating  the  chain’s  rising  input,  falling  output  response.  This  did  not  affect  the 
final  solution  since  this  transition's  delay  constraint  was  not  active,  but  it  halved  the  number  of  SPICE 
runs  needed  and  reduced  the  strain  on  DELIGHT'S  direction  finder.  DELIGHT  required  five  iterations 
to  converge  to  within  five  percent  of  the  optimum,  consuming  3018  cpu  seconds  on  a  VAX  11/750. 
Table  1  gives  the  statistics. 

The  performances  of  the  circuits  produced  by  the  two  optimizers  are  quite  similar.  Both  have  falling 
input,  rising  output  delays  of  8.0  ns  as  requested,  with  power  consumptions  of  2  mW.  The  power 
consumption  of  DELIGHT'S  circuit  is  less  than  ours  by  about  three  percent,  but  this  is  mainly  due  to 
the  removal  of  the  minimum  S  constraint  on  the  second  inverter. 

pu 

Our  optimizer  runs  considerably  faster  than  DELIGHT  with  SPICE.  It  is  difficult  to  make  an  exact 
comparison  of  how  fast  DELIGHT  would  run  on  the  DEC  20/60,  had  it  been  able  to  handle  the  inverter 
chain  without  simplifications,  but  we  can  make  fairly  accurate  estimates.  A  DEC  20/60  will  run 
Fortran  code  about  three  or  four  times  faster  than  will  a  VAX  750.  DELIGHT  only  evaluated  one  path 
transition,  with  fewer  transistor  size  constraints  and  an  initial  set  of  sizes  that  was  fairly  close  to  the 
optimum.  These  simplifications  halve  the  number  of  SPICE  simulations  per  iteration  and  reduce  the 
number  of  iterations  needed  to  reach  the  optimum,  leading  to  about  a  factor  of  five  improvement  in 
run  time.  Hence  we  believe  that  DELIGHT  would  require  about  4000  cpu  seconds  to  size  the  inverter 
chain  on  our  DEC  20.  This  is  about  300  times  slower  than  our  optimizer.  We  also  feel  that  our  run 
times  scale  better  than  DELIGHT'S  as  circuit  complexity  increases.  The  partitioning  scheme  used  by 
our  optimizer  leads  to  approximately  linear  growth,  while  the  growth  rates  of  DELIGHT'S  feasible 
directions  algorithms  and  SPICE's  simulation  algorithms  are  more  rapid. 

We  now  discuss  the  optimization  of  a  more  complicated  example,  a  four  bit  adder.  One  bit  of  the 
adder  is  shown  in  Figure  6.  The  adder  is  comprised  of  sixteen  logic  cells  having  a  total  of  72 
transistors.  Path  delay  constraints  were  placed  on  five  of  the  signal  paths.  Table  2  gives  the 
optimization  statistics.  Starting  with  minimum  size  transistors,  the  optimizer  required  only  520  cpu 
seconds  to  optimize  the  adder.  In  contrast,  considerably  more  time  was  needed  for  SPICE  runs  to  just 
verify  the  accuracy  of  the  predicted  delays. 
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Figu  re  6:  A  Full  Adder  Module 
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Efficient  algorithms  are  known  for  the  simple  linear  programming  problem 
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1.  Introduction 


Much  research  has  centered  on  the  problem  of  finding  shortest  paths  in  graphs.  It  is  well 
known  that  there  is  a  direct  correspondence  between  the  single-source  shortest-paths  problem  and 
the  following  simple  linear  programming  problem. 

Let  S  be  a  set  of  linear  inequalities  of  the  form  x}  —  xx  <  a,,,  where  the  x,  are  unknowns  and 
the  a X)  are  given  real  constants.  Determine  a  set  of  values  for  the  x%  such  that  the  inequalities 
in  S  are  satisfied,  or  determine  that  no  such  values  exist. 

This  paper  considers  the  mixed-integer  linear  programming  variant  of  this  problem  in  which  some 
(but  not  necessarily  all)  of  the  xt  are  required  to  be  integers.  The  problem  arises  in  the  context 
of  synchronous  circuit  optimisation  [8],  but  it  has  applications  to  PERT  scheduling  and  VLSI 
layout  compaction  as  well. 

Before  formally  defining  the  mixed-integer  programming  problem,  we  restate  the  linear  pro¬ 
gramming  problem  above  in  another  form. 

Problem  L.  Let  G  =  (V,E,a)  be  an  edge-weighted,  directed  graph,  where  V  =  {1, 2, . . .,  |V|} 
is  the  vertex  set,  the  set  E  of  edges  is  a  subset  ofV  X  V ,  and  for  each  edge  (i,j)  £  E  the  edge 
weight  o,,  is  a  real  number.  Find  a  vector  x  =  (xt,x2,...,X|y|)  satisfying  the  constraint  that 

Xj  —  X,  <  4,J 

for  all  (* ,j)  G  E,  or  determine  that  no  feasible  vector  exists. 

The  graph  G  is  called  a  constraint  graph  for  the  linear  programming  problem.  There  are 
three  advantages  in  adopting  a  graph  representation  of  the  problem.  First,  an  adjacency-list 
representation  [1,  p.  200]  of  the  constraint  graph  G  is  more  economical  than,  for  example,  a  linear 
programming  tableau  or,  when  the  graph  has  relatively  few  edges,  a  matrix  of  the  aXJ.  Second, 
Problem  L  frequently  arises  in  situations  that  are  naturally  described  by  a  graph.  Finally,  the 
graph- theoretic  formulation  helps  in  understanding  the  algorithms  that  solve  this  kind  of  problem. 

A  method  for  solving  Problem  L  was  discovered  in  the  late  1950’s  by  Ford  and  Bellman  [7,  p. 
74].  Yen  (12]  gave  some  improvements  to  the  Bollman-Ford  algorithm  as  well  as  a  cogent  analysis 
showing  that  its  running  time  is  0(|V|3).  This  bound  is  easily  improved  to  0(|V||£|)  by  using 
an  adjacency-list  representation  for  the  constraint  graph. 

The  Bellman-Ford  algorithm  can  also  be  used  to  solve  the  integer  linear  programming  variant 
of  Problem  L,  in  which  all  the  xx  are  required  to  be  integers.  If  the  edge  weights  a,,  all  happen  to 
be  integers,  the  Bellman-Ford  algorithm  will  produce  integer  values  for  the  x,.  If  the  a,;  are  not 
integers,  however,  but  the  x,  are  required  to  be  integers,  each  edge  weight  a,,  may  be  replaced 
by  |a,jJ  without  affecting  the  satisfiabiity  of  the  inequalities. 

The  focus  of  this  paper  is  the  mixed- integer  variant  of  Problem  L. 

Problem  ML  Let  G  —  (V,  V/,  E,  a)  be  a  " mixed-integer ,"  edge-weighted,  directed  graph,  where 
V  a*  1,2,...,  |  V|  is  the  vertex  set,  the  set  V/  is  a  subset  ofV,  the  set  E  of  edges  is  a  subset 
ofVxV,  and  for  each  edge  ( i,j )  €  E  the  edge  weight  atJ  is  a  real  number.  Find  a  vector 
x  =  (xi,xj,..., Z| v|)  satisfying  the  constraints  that 


for  all  ( i,j )  €  E  and  that  i,  €  Z  for  all  i  €  Vj,  or  determine  that  no  feasible  vector  exists. 

The  vector  z  =  (zj ,  x2, . . . ,  x,vi)  is  called  a  solution  to  graph  G,  and  if  graph  G  has  a  solution, 
we  say  that  G  is  satisfiable.  When  it  is  dear  from  context,  we  use  the  same  terminology  for 
Problem  L. 

In  addition,  we  shall  refer  to  the  vertices  in  Vt  as  the  integer  vertices  of  G  and  the  vertices  in 
VR  =  V  —  V[  as  the  real  vertices  of  G.  We  also  partition  the  set  of  edges  into  two  sets  depending 
on  whether  the  vertex  at  the  head  of  the  edge  is  integer  or  real: 

E*  =  {(i,j)eE\jevR}. 

This  paper  presents  two  algorithms  to  solve  Problem  MI.  The  first,  which  runs  in  0(|Vja|2J|) 
time,  is  a  straightforward  extension  of  the  Bellman-Ford  algorithm.  The  second  is  more  sophis¬ 
ticated  and  has  a  running  time  of  0(|V||i?|lg|V|)  for  arbitrary  graphs  and  0(|V’||.£|)  for  dense 
graphs.  We  conjecture  that  the  0(|V j|^|)  running  time  achieved  by  the  Bellman-Ford  algorithm 
for  the  pure  linear  programming  and  pure  integer  programming  versions  of  the  problem  is  not 
achievable  in  general  for  Problem  MI. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  2  reviews  the  Bellman-Ford 
algorithm.  Section  3  presents  a  simple  relaxation  algorithm  for  solving  Problem  MI.  Section 
4  discusses  two  techniques—  Dijkstra’s  algorithm  and  reweighting — which  are  used  in  Section  5 
to  construct  an  asymptotically  efficient  algorithm  for  Problem  MI.  We  discuss  applications  and 
present  some  concluding  remarks  in  Section  6. 

2.  Shortest  paths  and  the  Bellman-Ford  algorithm 

This  section  reviews  how  the  Bellman-Ford  algorithm  solves  Problem  L.  Although  the  results 
of  this  section  are  well  known  and  can  be  found  in  most  textbooks  on  combinatorial  optimization 
(see,  for  example,  (7,  p.  74]),  we  repeat  the  material  here  for  the  reader’s  convenience. 

There  is  a  natural  correspondence  between  Problem  L  and  the  graph- theoretic  single-source 
shortest-paths  problem.  Let  G  =  ( E,V,a )  be  an  instance  of  Problem  L.  Suppose  that  for  each 
vertex  i£  V,  there  is  a  path  to  i  from  vertex  1,  and  let  d,  be  the  weight  of  shortest  (least- weight) 
path  from  vertex  1  to  vertex  i.  (At  the  end  of  the  section,  we  shall  discuss  the  case  in  which  some 
vertices  are  not  reachable  from  vertex  1.)  Then  for  any  edge  (»,  j)  £  E,  we  have  d,  —  d.  <  atJ 
since  the  edge  (t,  j)  can  be  appended  to  a  shortest  path  from  vertex  1  to  vertex  t  to  produce  a 
path  from  vertex  1  to  vertex  j  of  weight  d,  - j-a,,.  Thus  the  shortest-path  weights  d  are  a  solution 
to  G. 

Whenever  G  is  satisfiable,  there  are  infinite  number  of  solutions.  For  example,  if  z  is  a  solution 
to  G,  then  uniformly  adding  any  constant  k  to  each  x,  yields  another  solution  y,  where  y,  = 
x,  +  k  for  each  i  €  V.  The  assignment  x,  «-  d,  gives  each  x,  its  largest  possible  value  subject  to 
the  constraint  that  xt  =  0.  To  see  this,  cohsidcr  any  path  p  of  weight  d,  from  vertex  1  to  vertex 
i.  If  the  inequalities  associated  with  the  edges  of  p  are  summed,  the  unknowns  associated  with 
the  intermediate  vertices  cancel  and  the  result  is  the  inequality  z,  —  z2  <  d,. 

Whenever  the  graph  G  contains  some  cycle  c  whose  weight  is  negative,  the  shortest  path 
weight  from  vertex  1  to  any  vertex  i  on  cycle  c  is  undefined  because  the  weight  of  any  path 
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to  vertex  i  can  be  diminished  by  appending  a  traversal  of  c.  In  this  case  the  graph  C  is  not 
satisfiable.  If  the  inequalities  associated  with  the  edges  of  c  are  summed,  all  the  unknowns  x, 
cancel,  and  the  resulting  inequality  asserts  that  0  is  less  than  or  equal  to  the  weight  of  c,  which 

is  false. 

The  Bellman-Ford  algorithm,  which  is  given  below,  solves  Problem  L  by  finding  the  weight 
of  the  shortest  path  to  each  vertex  from  vertex  1.  Should  the  graph  contain  a  negative-weight 
cycle,  the  algorithm  reports  that  the  graph  is  unsatisfiable  by  calling  the  procedure  Fail,  whose 
semantics  we  leave  unspecified. 

Algorithm  BF  (Bellman-Ford  algorithm). 

BF1.  Z|  —  0; 

BF2.  for  i  ♦-  2  to  |Vj  do  x,  ♦-  oo; 

BF3.  for  md  *-  1  to  |Vj  —  1  do 

BF4.  foreach  (t,  j)  €  E  do 

BF5.  if  x,  >  x,  -f  at]  then  x}  «-  x,  -f  <*»>; 

BF6.  foreach  (i,j)  €  E  do 

BF7.  if  x,  >  x,  +  then  Fail 

For  each  vertex  j  6  V,  the  Bellman-Ford  algorithm  iteratively  updates  the  weight  x}  of  a 
tentative  shortest  path  from  vertex  I  to  vertex  j.  After  initialization,  the  algorithm  makes  |V|— 1 
passes  through  the  edges  in  E  and  relaxes  each  edge  (t,j)  by  computing  x,  —  min(x,,x,  ■+•  a,,). 

A  simple  analysis  due  to  Yen  [12]  indicates  why  the  Bellman-Ford  algorithm  works.  The 
weight  x,  converges  to  the  weight  d,  of  a  shortest  path  from  vertex  1  to  vertex  j  if  the  edges  on 
the  path  are  relaxed  in  order  along  the  path.  The  sequence  of  edges  relaxed  by  the  Bellman-Ford 
algorithm  consists  of  |Vj  —  1  copies  of  some  ordering  of  E,  and  therefore  contains  every  vertex- 
disjoint  path  as  a  subsequence.  If  there  are  no  negative-weight  cycles  in  G,  then  every  shortest 
path  is  vertex  disjoint,  so  each  xx  converges  to  the  shortest-path  weight  d,.  On  the  other  hand, 
if  there  is  a  negative-weight  cycle  in  the  grJ^h,  the  algorithm  detects  this  condition  by  iterating 
once  more  through  all  edges  to  see  whether  any  of  the  inequalities  remain  unsatisfied. 

The  Bellman-Ford  algorithm  as  given  above  determines  the  weight  of  the  shortest  path  from 
vertex  1  to  each  vertex,  and  therefore  solves  Problem  L  whenever  all  vertices  of  G  are  reachable 
from  vertex  1.  The  code  can  be  adapted  to  solve  Problem  L  on  arbitrary  graphs  by  simply 
changing  the  initialization  step  (lines  BF1-BF2).  In  particular,  if  each  x,  is  assigned  a  finite 
initial  value  u„  the  relaxation  in  lines  BF3-BF5  sets  each  xx  to  its  maximum  value  subject  to  the 
constraints  that  x,  —  x,  <  aX}  for  each  edge  (t,j)  €  E  and  that  x,  <  u,  for  each  vertex  i  €  V. 
Notice  that  whenever  the  constraint  graph  G  is  satisfiable,  it  is  satisfiable  subject  to  the  additional 
constraints  z,  <  u,.  Should  the  inequalities  be  inconsistent  because  there  is  a  negative-weight 
cycles  in  the  graph,  the  relaxation  will  not  converge  to  a  solution,  and  the  inconsistency  will  be 
detected  by  the  test  in  lines  BF6-BF7. 

•  • 

3.  Simple  relaxation  algorithms  for  Problem  MI 

As  was  mentioned  in  the  introduction,  Problem  MI  can  be  solved  directly  by  the  Bellman- 
Ford  algorithm  when  all  unknowns  are  real  (Problem  L)  and  when  all  unknowns  are  integer. 
The  combination  of  integer  and  real  unknowns,  however,  seems  to  make  the  problem  harder. 
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In  this  section,  we  gain  some  intuition  about  the  structure  of  Problem  MI  by  introducing  two 
algorithms  that  solve  it  in  much  the  same  way  as  the  Bellman-Ford  algorithm  solves  Problem 
L.  The  asymptotically  efficient  algorithm  from  Section  4  is  derived  from  the  second  of  these 
algorithms. 

A  natural  approach  to  solving  Problem  MI  is  to  see  whether  the  Bellman-Ford  relaxation 
approach  can  be  made  to  work.  Since  we  have  both  integer  and  real  vertices  in  the  graph, 
however,  we  must  modify  the  relaxation  step  BF5  in  the  Bellman-Ford  algorithm  to  produce  an 
integer  value  whenever  j  is  an  integer  vertex  (line  R6).  This  approach  does  in  fact  work,  but 
it  requires  more  iterations  than  the  simple  Bellman-Ford  algorithm.  The  next  algorithm  solves 
Problem  MI.  The  number  of  iterations  n  in  line  R2  will  be  determined  in  the  analysis  following 
the  algorithm. 

Algorithm  R.  ( Relaxation .) 


Rl. 

foreach  i  £  V  do  x,  4—  0; 

R2. 

for  t'nd  1  to  n  do 

R3. 

foreach  (i,  j)  £  E  do 

R4. 

begin 

R5. 

x}  «-  min(ij,x. 

+  «•>); 

R6. 

if  j  £  Vj  then  x} 

i  -  l*j. 

R7. 

end; 

R8. 

foreach  (*,  j)  £  E  do 

R9. 

if  x,  >  x,  -+•  aXJ  then  Fail; 

In  order  to  determine  a  value  of  n  such  that  Algorithm  R  works,  we  introduce  the  notion  of 
a  reducing  path.  Let  p  be  a  path  starting  at  some  vertex  k,  and  suppose  that  x*  is  initially  set  to 
0  and  that  all  the  remaining  x,  are  initialized  to  oo.  Suppose  the  edges  in  path  p  are  traversed 
in  order  starting  from  k,  and  each  edge  [i,j)  along  the  path  is  relaxed  as  in  statements  R5-R6. 
If  each  relaxation  of  an  edge  (i,j)  reduces  the  value  x„  the  path  p  is  called  a  reducing  path. 

Whenever  a  sequence  of  edges  contains  all  reducing  paths  as  subsequences,  the  relaxation  of 
each  edge  in  the  sequence  in  order  yields  a  solution.  (The  proof  is  analogous  to  Yen’s  analysis 
[12]  of  the  Bellman-Ford  algorithm.)  The  Bellman-Ford  algorithm  solves  Problem  L  because  in  a 
satisfiable  graph  with  only  real  vertices,  each  vertex  occurs  at  most  once  on  any  single  reducing 
path.  (And  in  fact,  every  shortest  path  is  a  reducing  path.) 

When  some  unknowns  are  integer  and  some  are  real,  however,  it  is  possible  for  a  reducing 
path  to  visit  the  same  vertex  more  than  once,  even  if  the  graph  is  satisfiable.  For  example,  in  the 
graph  shown  in  Figure  1,  the  reducing  path  p  =  3-*2-»l-*2-»3-*4-*3-»2  visits  vertices 
2  and  3  three  times  each.  If  all  the  x,  are  initially  set  to  0,  the  edges  of  p  must  be  relaxed  in 
their  order  along  the  path  to  achieve  convergence.  Moreover,  relaxing  the  entire  edge  set  in  some 
arbitrary  order  only  3  =  |V|  —  1  times  might  not  achieve  convergence.  Since  the  value  of  n  in 
line  R2  must  be  at  least  the  maximum  number  of  edges  in  any  reducing  path,  the  value  |V|  —  1, 
which  was  used  in  Algorithm  BF,  will  not  suffice. 

Fortunately,  reducing  paths  are  never  very  long  in  satisfiable  graphs  because  of  the  following 
lemma. 

Lcmmc  1.  Suppose  C  =  (V,V]tE,  a)  is  satisfiable.  If  p  is  a  reducing  path  in  C,  then 


1.  p  visits  no  integer  vertex  more  than  once,  and 

2.  p  never  visits  the  same  real  vertex  twice  without  visiting  some  integer  vertex  in 
between. 

Proof.  If  either  condition  is  violated,  then  the  reducing  path  p  can  be  extended  indefinitely  by 
repeating  the  cycle  that  causes  violation.  | 

Lemma  1  allows  us  to  determine  a  value  for  n  in  line  R2  of  Algorithm  R  such  that  the  x 
converges  to  a  solution  whenever  G  is  satisfiable.  Any  reducing  path  contains  each  integer  vertex 
at  most  once  and  each  real  vertex  at  most  |V)|  -f  1  times.  Since  the  number  of  edges  in  a  path  is 
one  less  than  the  number  of  vertices,  any  reducing  path  for  a  satisfiable  graph  can  have  no  more 
than  |V/|  +  (|V/|.+  1)|VR|  —  1  =  |V/||Vr|  +  |V|  —  1  edges.  Thus  the  limit  n  of  the  outer  loop 
in  Algorithm  R  should  be  set  to  |V/||Vr|  +  |V|  —  1. 

This  analysis  suggests  the  following  algorithm  which  is  slightly  more  efficient  than  Algorithm 
R,  and  which  forms  the  basis  of  the  asymptotically  efficient  algorithm  presented  in  the  next 
section. 

Algorithm  M.  (Modified  relaxation.) 

Ml.  foreach  i  £  V  do  r* «-  0; 

M2.  for  ind  —  1  to  |Vr|  do 

M3.  foreach  (t,jf)  £  Er  do 

M4.  Zj  «-  min(x,,  x,  -f-  atJ); 

MS.  for  ind2 «—  1  to  |V/|  do 
M6.  begin 

M7.  foreach  (i,j)  £  Ej  do 

M8.  Zj  —  min(Xj,  [x,  + 

M9.  for  ind «-  1  to  |Vr|  do 

M10.  foreach  (i,j)  £  Er  do 

Mil.  Zj «-  min (x,,  x,  +  a*,); 

M12.  end; 

M13.  foreach  (i,  j)  £  £  do 

Ml  4.  if  Zj  >  x,  +  a%,  then  Fail; 

The  only  difference  between  this  algorithm  and  Algorithm  R  is  that  it  treats  the  edges  in  Et 
separately  from  the  edges  in  Er.  In  lines  M7-M8  of  Algorithm  M,  each  edge  in  Ei  is  relaxed  once. 
There  are  |V/{  such  passes  over  Ej  which  are  preceded,  followed,  and  separated  by  exhaustive 
relaxations  of  the  edges  in  Er  (lines  M2-M4  and  M9-M11).  In  each  exhaustive  relaxation  of  Er, 
edges  are  relaxed  until  no  further  changes  in  the  values  of  x;  are  possible  for  j  £  Vr.  (Actually, 
the  relaxations  in  lines  M2-M4  and  M9-M11  are  only  guaranteed  to  be  exhaustive  if  there  are 
no  negative- weight  cycles  in  Er.  If  there  are  cycles  of  negative  weight,  however,  this  condition 
is  detected  at  the  end  by  the  convergence  test  in  lines  M13-M14.) 

4.  Dijkstra’s  algorithm  and  reweighting 

Section  5  gives  a  more  efficient  algorithm  to  solve  Problem  MI  than  either  Algorithm  R 
or  Algorithm  M.  Two  important  techniques  are  used  in  the  algorithm.  The  first  is  Dijkstra’s 
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algorithm  which  finds  shortest  paths  in  a  graph  from  a  single  source  in  the  case  when  all  the  edge 
weights  are  nonnegative.  The  other  is  reweighting,  which  is  a  technique  due  to  Edmonds  and 
Karp  |3j  and  used  by  Johnson  (6|  in  his  efficient  algorithm  for  solving  the  all-pairs  shortest- paths 
problem. 

Given  a  graph  G  =  ( V,E,a )  such  that  all  edge  weights  at)  are  nonnegative,  Dijkstra's 
algorithm  computes  for  each  vertex  i,  the  weight  it  of  the  shortest  path  from  vertex  1.  Because 
each  edge  is  relaxed  exactly  once,  this  algorithm  is  faster  than  the  Bellman-Ford  algorithm  which 
solves  the  same  problem  for  arbitrary  edge  weights.  Dijkstra’tf  algorithm  derives  its  efficiency  from 
the  observation  that  along  any  shortest  path  from  vertex  1,  the  shortest- path  weights  d,  form  a 
nondecreasing  sequence  if  all  the  edge  weights  are  nonnegative.  Thus,  a  sequence  consisting  of  all 
edges  (i,j)  £  E  in  nondccrcasing  order  of  the  distances  d,  contains  as  subsequences  shortest  paths 
from  vertex  1  to  all  vertices  in  V .  Furthermore,  such  a  sequence  of  edges  can  be  computed  on 
the  fly  using  a  priority  queue.  (The  textbook  (1)  gives  a  proof  of  correctness  for  this  algorithm.) 
Algorithm  D  (Dijkstra’s  algorithm). 

Dl.  x\  «-  0; 

D2.  for  t  ♦-  2  to  |V|  do  Xi  ♦-  oo; 

D3.  Q  <-  V; 

D4.  while  Q  76  0  do 

D5.  begin 

D6.  Choose  i£Q  such  that  x*  =  min  x}\ 

D7. 

D8.  foreach  j  £  Vr  such  that  ( i,j )  £  Er  do 

D9.  x,  «-  min(x„  x,  -f-  ); 

DIO.  end; 

If  the  set  Q  in  the  algorithm  is  implemented  as  a  standard  priority  queue,  each  extraction 
(lines  D5-D6)  and  update  (line  D8)  costs  0(lg|Q|)  —  0(lg|V|)  time.  Thus  the  total  running 
time  of  Dijkstra’s  algorithm  is  0(|£|  lg  |V|).  Johnson  [6J  shows  that  by  implementing  Q  as  a 
fixed-height  heap  [5],  the  running  time  can  be  brought  down  to  0(h|£|  +  where  h  is 

an  integer  constant  that  may  be  chosen  after  the  input  is  presented.  The  choice  h  =  (lg  |Vfl  gives 
the  bound  0[\E\  Ig|V|).  For  families  of  dense  graphs  where  |£|  =  n(|Vj1+<)  for  some  constant 
e  >  0,  the  choice  h  =  [l/e]  gives  an  0(]E|)  bound. 

Since  Dijkstra’s  algorithm  is  equivalent  to  the  Bellman-Ford  algorithm  on  graphs  with  non¬ 
negative  edge  weights,  it  can  be  used  to  solve  Problem  L  on  such  graphs.  This  is  not  very 
interesting  in  itself,  since  any  graph  G  =  [V,E,  a)  in  which  all  edge  weights  are  nonnegative 
can  be  trivially  satisfied  by  setting  ii  to  0  for  each  *  €  V”.  Our  interest  in  Dijkstra’s  algorithm 
comes  from  a  stronger  property  of  the  solutions  it  finds.  Suppose  the  initialization  step  (lines 
D1-D2)  is  changed  so  that  each  variable  x,  is  initialized  to  a  finite  value  u(.  Then  the  relaxation 
procedure  in  lines  D3-D10  will  set  each  x,  to  its  largest  possible  value  consistent  with  the  con¬ 
straints  that  x,  —  x,  <  at}  for  each  tfdge  {»,;)  €  E  and  that  xt  <  u,  for  each  vertex  i  £  V.  In 
other  words,  lines  D3-D1Q  of  Dijkstra's  algorithm  arc  functionally  equivalent  io  lines  BF3-BF5 
of  the  Bellman-Ford  algorithm  provided  that  all  the  edge  weights  at]  arc  nonnegative.  Since  a 
graph  with  only  nonnegative  edge  weights  can  never  contain  a  negative-weight  cycle,  no  test  for 
convergence  is  necessary  in  this  case. 
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The  efficient  algorithm  we  shall  present  to  solve  Problem  MI  is  a  modification  of  Algorithm 
M.  Notice  that  lines  M9-M1 1  of  Algorithm  M  exhaustively  relax  the  edges  in  Er  in  a  manner 
similar  to  lines  BF2-BF4  of  the  Bellman*Ford  algorithm.  In  Algorithm  M,  however,  this  code  is 
executed  many  times.  The  efficient  algorithm  to  solve  Problem  MI  uses  a  trick  to  replace  this 
code  with  code  based  on  the  more  efficient  relaxation  procedure  in  lines  D3-D10  of  Dijkstra's 
algorithm.  This  trick  is  the  technique  of  reweighting  due  to  Edmonds  and  Karp  [3j. 

Lemma  2.  Let  G  =  [V,E,  a)  be  an  edge- weighted  graph,  Jot  each  i^V  let  r,  be  a  real  number, 
and  let  H  ss  ( V,E,b )  where  bX]  =*  a,j  -f  r*  —  r,  for  each  edge  ( i,j )  g  E.  For  each  vertex 
i  g  V  let  x,  be  a  real  number  and  let  y,  =  x,  —  r,.  Then  x}  —  x,  <  a,3  for  all  ( i,j }  g  E  if 
and  only  if  y,  —y%<  btJ  for  all  (i,j)  g  E  (that  is,  x  is  a  solution  to  G  if  and  only  if  y  is  a 
solution  to  H.) 

Proof.  Trivial.  | 

We  call  the  vector  r  =  (rj, rj, . . . , rjvj)  &  reweighting  of  the  graph  G. 

5.  An  asymptotically  efficient  algorithm  for  solving  Problem  MI 

This  section  shows  how  Dijkstra’s  algorithm  and  reweighting  can  be  incorporated  into  Algo¬ 
rithm  M  to  yield  a  faster  algorithm  for  solving  Problem  MI.  Given  a  graph  G  =»  ( V,  Vj,  E,  a),  the 
idea  is  to  find  a  reweighting  r  such  that  the  reweighted  graph  H  =  ( V ,  Vj,E,  b)  has  edge  weights 
btJ  =  a,,  -f  r,  —  r,  >  0  for  all  edges  (*',  j)  g  Er.  Lemma  2  guarantees  that  G  is  satisfiable  if  and 
only  if  H  is  satisfiable  and  also  that  a  solution  y  to  H  can  be  converted  into  a  solution  x  to  G  by 
setting  x,  =  y%  +  r,  tor  each  t  £  V.  The  advantage  gained  by  transforming  the  problem  on  G  to 
a  problem  on  H  is  that  the  relaxation  portion  of  Dijkstra's  algorithm  (lines  D3-D10)  can  replace 
the  Bellman-Ford  relaxation  (lines  M9-M11),  which  is  the  most  expensive  part  of  Algorithm  M. 

The  first  stage  of  the  algorithm  is  to  determine  the  reweighting  values  r,  for  all  t  g  V  and 
the  new  edge  weights  bt}  =  o,  -f  r,  -  r,  for  all  (i,j)  g  E.  We  must  choose  the  values  rj  such 
that  6,  >  0  for  all  (i,  j)  g  Er.  Since  this  is  equivalent  to  requiring  that  r}  —  r j  <  a%J  for  all 
(t ,j)  g  Er,  values  for  the  r«  can  be  found  by  applying  the  Bellman-Ford  algorithm  to  the  graph 
[V,Er,  a).  The  first  few  lines  of  the  algorithm  are: 

Algorithm  T.  (Efficient  algorithm.) 

Tl.  for  t  g  V  do  rj  0; 

T2.  for  tnd «-  1  to  |V«|  do 
T3.  tor  (i,j)  g  Er  do 
T4.  r,  —  min(r„r,  +  Oj,); 

T5.  for  (*,.?)  g  Er  do 

T6.  if  r;  >  fj  -f  Ojj  then  Fail 

T7.  for  (i,  j)  g  £  do 

T8.  bt}  «-  «j,  +  fj  -  r,; 

If  the  algorithm  fails  in  line  T6,  then  there  is  a  cycle  of  negative  weight  among  the  edges  in 
Er,  and  hence  graph  G  it  unsatisfiable  even  in  the  absense  of  integer  constraints.  Otherwise,  the 
values  bX}  computed  in  line  T8  arc  nonnegative  for  all  (t,;)  g  Er. 


The  next  stage  of  Algorithm  T  is  to  solve  the  mixed-integer  problem  on  the  graph  H  =s 
[V,Vi,E,  b)  by  alternately  relaxing  the  edges  in  Ej  and  the  edges  in  Er  as  in  Algorithm  M.  We 
begin  by  initializing  the  values  y»,  which  will  converge  to  a  solution  to  H  it  H  is  satisfiable. 

T9,  for  i  £  V  do  y,  0; 

This  initialization  has  the  added  fortune  of  subsuming  the  first  exhaustive  relaxation  of  Er  (lines 
M2-M4  in  Algorithm  M).  After  the  execution  of  line  T9  we  have  y,  —  &  =  0  —  0  <  bXJ  for  all 
(t,  j)  £  Er,  which  means  that  the  edges  in  Er  are  already  exhaustively  relaxed. 

The  next  portion  of  Algorithm  T  parallels  lines  M5-M11  of  Algorithm  M  and  is  where  most 
of  the  computing  gets  done. 

T10.  for  ind  *-  1  to  |V/|  do 

Til.  begin 

T12.  for  (t,  j)  £  Es  do 

T13.  y,  «-  min(y,,  [y,  -f  b^J); 

T14.  Q  -  V; 

T15.  while  Q  jd  0  do 

T16.  begin 

T17.  Choose  i£Q  such  that  y«  =  min  y 

T18.  Q-Q-{})\ 

T19.  for  j  £  Vr  such  that  ( i,j )  €  Er  do 

T20.  y,  —  min(y„  y,  +  6,,); 

T21.  end; 

T22.  end; 

This  code  solves  the  problem  on  graph  H  in  almost  exactly  the  same  way  that  Algorithm  M 
would.  The  only  difference  is  the  method  by  which  the  edges  of  Er  are  exhaustively  relaxed. 
Whereas  lines  M9-M11  of  Algorithm  M  perform  the  exhaustive  relaxation  using  the  Bellman- 
Ford  algorithm,  lines  T14-T21  of  Algorithm  T  take  advantage  of  the  nonnegativity  of  the  btJ  for 
(i,  jj  £  Er  and  use  Dijkstra’s  algorithm. 

The  final  part  of  Algorithm  T  is  to  check  the  convergence  of  the  y  and  to  apply  Lemma  2  to 
produce  a  satisfying  assignment  z  for  the  original  graph  G. 

T23.  for  (t,  j)  £  Ej  do 

T24.  if  y,  >  y»  +  btJ  then  Fail; 

T25.  for  (i,j)  £  E  do 
T26.  i, «-  y*  +  *»; 

Lines  T23-T24  check  the  convergence  of  y  by  testing  the  inequalities  associated  with  the  edges 
in  Ei.  The  inequalities  resulting  from  edges  in  Er  need  not  be  checked  because  the  relaxation 
in  lines  T14-T22  is  guaranteed  to  be  exhaustive.  (If  there  were  negative-weight  cycles  in  Er,  we 
would  have  detected  this  in  lines  T5-T6.) 

Lines  T25-T26  convert  the  solution  y  to  graph  H  into  a  solution  z  to  graph  G.  Lemma  2 
ensures  that  the  inequalities  z}  —  z,  <  a%J  are  satisfied,  but  we  must  also  show  that  the  z,  are 
integers  for  all  i  £  Vj.  For  each  i  £  Vj  the  value  y,  is  an  integer,  however,  and  furthermore,  the 


values  of  the  r,  produced  in  lines  T1-T4  are  zero  for  ail  t  £  V/.  Thus  for  all  the  integer  vertices, 

the  z,  are  integers. 

In  summary,  we  have  proved  the  following  theorem. 

Theorem  3.  Algorithm  T  solves  Problem  MI. 

The  running  time  of  Algorithm  T  is  0(|V||£flg|V|).  (Johnson’s  techniques  [6]  [5]  can  be  used 
to  reduce  the  actual  running  time  to  0(|V||£|)  for  dense  graphs  by  implementing  the  priority 
queue  Q  as  a  fixed-height  heap.)  Tighter  analysis  in  terms  of  the  sizes  of  the  sets  Vj,  Vr,  Ej,  and 
Eft  is  possible,  however.  In  particular,  the  closer  bound  0(|Vr||Er|  +  |V/||£/|  +  |V/||E*|lg|V|) 
indicates  that  the  algorithm  performs  even  better  when  the  number  of  integer  vertices  is  small. 

6.  Applications,  extensions,  and  conclusions 

The  solution  to  Problem  MI  was  demanded  by  a  problem  concerning  optimization  of  sychronous 
circuitry  by  retiming  (8).  This  section  briefly  reviews  this  application,  and  gives  two  other 
problems— compaction  of  VLSI  circuits  in  the  presence  of  power  and  ground  busses  and  PERT 
scheduling  with  periodic  constraints — which  can  be  reduced  to  Problem  L.  We  also  consider  an 
extension  of  Problem  MI  where  multiple  sets  of  periodic  constraints  must  be  satisfied.  (For  ex¬ 
ample,  some  of  the  x,  are  required  to  be  integers,  and  others  to  be  exact  multiples  of  a  constant 
c.)  This  section  is  abbreviated  in  the  extended  abstract. 

Circuit  optimisation  by  retiming 

This  application  is  omitted.  (The  interested  reader  is  refered  to  [8].) 

PERT  scheduling 

Suppose  we  have  a  constraint  graph  representing  milestones  in  a  project,  the  edge-weights 
indicate  the  timing  constraints  between  milestones.  Generally,  the  Bellman-Ford  algorithm  can 
be  used  to  provide  an  optimal  scheduling  of  the  milestones.  If  a  work  day  is  from  9:00  a.m.  to 
5:00  p.m.,  however,  we  may  not  wish  to  schedule  a  one-hour  job  to  start  at  4:30  p.m.  Advancing 
the  job  to  the  next  day,  however,*  may  cause  another  job  to  be  advanced  as  well  if  the  two  jobs  are 
constrained  to  fall  near  each  other.  The  problem  of  PERT  scheduling  with  periodic  constraints 
can  be  cast  as  Problem  ML 

Intuitively,  the  mixed-integer  formulation  allows  one  to  include  for  each  job  1.  a  (real)  variable 
representing  the  starting  time  of  (he  job,  and  2.  an  (integer)  variable  representing,  say,  noon  on 
the  day  the  job  occurs.  Thus  one  can  include  constraints  which  say,  “This  job  must  finish  before 
5:00  p.m.  on  the  day  it  occurs,”  and  “These  two  jobs  must  start  on  the  same  day.” 

We  also  can  solve  certain  problems  when  there  are  additional  periodic  constraints  using  an 
algorithm  that  runs  in  0(JVJ  )  time.  As  an  example,  we  may  wish  to  have  not  only  variables 
representing  noon  on  the  day  that  a  job  starts,  but  also  variables  representing  the  week  that  a 
job  starts.  Thus  constraints  involving  weekends  could  be  taken  into  consideration. 

Circuit  compaction  *  * 

Optimal  (one-dimensional)  compaction  of  VLSI  circuit  layouts  (4|  is  another  application  of  the 
Bellman-Ford  algorithm.  Each  layout  feature  is  given  a  variable  representing  an  z-coordinate, 
and  the  design  rules  are  enforced  using  constraints  of  the  form  z,  —  z,  <  a,r  It  may  be  desirable, 
however,  to  allow  feature  i  to  be  to  the  left  of  feature  j  or  vice  versa,  but  not  to  allow  them 
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to  occupy  the  same  position.  Unfortunately,  if  one  wishes  to  allow  this  kind  of  transposition  of 
layout  features,  either  optimality  or  performance  must  be  sacrificed  because  the  problem  becomes 
NP-complete  [9].  But  for  certain  compaction  problems  arising  in  practice,  transposition  of  layout 
features  can  be  allowed. 

Some  design  methodologies  enforce  the  placement  of  power,  ground,  and  clock  to  be  at  regular 
intervals.  For  example,  one  signal  processing  system  [10]  requires  that  these  wires  be  repeated 
every  2Q0X,  and  that  the  width  of  all  cells  in  the  system  be  integer  multiples  of  this  distance. 
The  designer  is  then  constrained  to  build  a  new  cell  so  that  the  layout  features  are  tightly  packed 
among  the  global  wires.  In  this  context,  where  some  layout  features  may  go  on  one  side  or  the 
other  of  some  global  wire  but  may  not  overlap,  the  compaction  problem  can  be  formulated  as 
Problem  MI. 
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Abstract 

Computationally  simple  bounds  for  signal  pro¬ 
pagation  delay  in  linear  RC  tree  models  for  MOS 
interconnect  were  derived  in  [11  and  have  proved 
useful  in  timing  analysis  of  digital  MOS  IC's  [2-41. 
We  show  that  these  bounds  can  be  derived  quite 
simply  as  the  payoff  functions  for  a  certain  linear 
optimal  control  problem  and  that  they  apply  not  only 
to  RC  trees  but  to  more  general  RC  mashes  as  wall. 
Finally,  two  methods  are  given  for  tightening  the 
original  bounds  given  in  [11. 

I.  Introduction 

In  digital  integrated  circuits,  signal  pro¬ 
pagation  delay  through  conducting  paths  with  dis¬ 
tributed  resistance  and  capacitance  is  frequently  a 
significant  part  of  the  total  delay  and  grows  in 
relative  importance  as  feature  sires  shrink.  Timing 
analysis  of  digital  IC's  can  be  speeded  up  by  using 
aporoximate  delay  formulas,  e.g. ,  the  "Elmore  delay" 
[5],  in  place  of  detailed  numerical  simulation  for 
interconnect  paths.  Bounds  on  the  delay,  applicable 
to  those  paths  that  can  be  modelled  as  linear, 
nonuniform  branched  RC  ladder  networks,  i.e.,  "RC 
trees,"  were  derived  in  [11.  But,  as  discussed  in 
[6-81,  certain  circuits  used  in  MOS  logic  cannot  be 
modelled  as  RC  trees  because  they  contain  one  or  mors 
loops  of  resistors,  as  shown  in  Fig.  1.  Several 
examples  of  such  circuits,  called  "RC  meshes," 
arising  in  MOS  logic  networks  are  given  in  [6-7] . 

As  used  in  this  paper,  the  term  RC  mesh  includes  nc 
trees  as  a  special  case. 

This  paper  is  concerned  with  bounds  on  signal 
propagation  delay  in  linear,  lusgsed  RC  tree  and  mesh 
networks  driven  by  an  ideal  voltage  source.  Since 
the  meaning  of  "delay"  is  somewhat  application- 
dependent.  we  bound  the  delay  by  bounding  the  zero- 
state  step  response  at  any  output  node  of  interest. 


II.  Network  Differential  Equations  for  RC  Meshes 
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Fig.  1:  This  linear  RC  mesh  differs  from  an  RC 
tree  because  of  the  resistor  loop. 

Node  Numbering  Convention 

The  ground  node  is  not  numbered.  The  node 
connected  to  the  voltage  source  is  numbered  o. 

The  remaining  nodes  are  numbered  in  any  order  from 
1  to  N,  where  N  is  the  total  number  of  capacitors, 
as  in  Fig.  1. 

We  isolate  the  resistor  subnetwork  R  containing 
all  the  resistors  and  assign  reference  directions 
to  the  capacitor  currents  i^,....  iN  as  shown  in 
Fig.  2.  Let  node  0  serve  es  the  datum  node  of  R. 

The  node  voltages  with  respect  to  datum  are  given 
in  terms  of  the  capacitor  currents  by  the  resistance 
matrix  R  as  shown  below. 
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Of  course  R  is  symmetric  since  R  is  reciprocal,  and 
R  is  positive  definite  because  all  resistors  are 
assvsed  positive. 

Consider  the  step  response  of  the  nettrork  with 
zero  initial  conditions.  Substituting  e«l  and 
ifc  *  -CfcVfc  into  (1),  we  obtain  the  network 
differential  equations 

1 '  Vw  “  A‘l . 6  *  °* 


(2) 
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Fig.  2:  The  resistor  subnetwork  R  extracted  from 
t.-.e  circuit  in  Fig.  1  is  described  by  a  resistance 
matrix  as  in  (1) . 


which  are  identical  in  font  to  eg.  (9)  of  II]:  the 
only  difference  is  that  in  (II  certain  resistances 
Hi.  appear  in  place  of  the  r^j's  above.  The 
resistances  were  defined  specifically  in  terms 
of  the  topology  of  a  tree  in  Ill,  while  the  rij'* 
here  are  defined  for  an  arbitrary  mesh  as  the 
elements  of  the  resistance  matrix.  The  reader  can 
easily  verify  that  the  two  definitions  agree  in  the 
special  case  of  a  tree. 

III.  Optimal  Control  Method  for  Determining  Bounds 
or.  the  Stes  Response 


The  original  derivation  of  step  response  bounds 
in  Ill.  chough  entirely  correct,  is  sosiewhat  obscure 
and  applies  only  to  RC  trees.  The  alternate  deriva¬ 
tion  outlined  below  yields  essentially  the  sane 
results,  but  it  applies  to  meshes  as  well  and  also 
affords  a  natural  way  to  incorporate  additional 
information  and  thereby  obtain  tighter  bounds,  as 
srown  in  Section  IV.  Facts  1-4  below  parallel  the 
development  in  (11  and  are  given  here  for  complete¬ 
ness.  Fact  3  will  be  examined  more  closely  in 
Section  7. 

Fact  1 


For  any  three  nodes,  i,  j,  k  of  an  RC  mesh. 


rii  rkj  -  rki  rij 


(3) 


The  proof  of  (3),  given  in  (7],  generalizes  the 
argument  for  the  special  case  of  a  tree  in  III. 


Given  Facts  1  and  2,  the  derivation  of  (5)  and 
(6!  is  identical  to  that  in  Appendix  0  of  III.  i.e., 
for  (5)  note  that  r^.d-vj.)  -  rxid-v^  ■  (using 
(2)>. 
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whare  tha  last  inequality  follows  from  Facta  1  and 
2.  The  proof  of  (6)  is  similar. 

At  this  point  the  strategy  becomes  one  of 
reduced  order  modelling  with  time -domain  error 
bounds.  Choosing  a  distinguished  node  i  as  the 
output  nods  of  interest,  we  seek  to  describe  the 
system  in  tarns  of  only  two  state  variables,  tha 
distance  to  equilibrium  (l-vi (t) )  and  its  integral 

f.  (t)  -  f  ■  tl-v  (t')ldf  -  l  r..C.  [1-v.  (t)  1 , 

1  t  1  k  1  (7) 


where  the  last  equality  follows  upon  substituting 
(2)  into  the  integral  and  evaluating.  Using  (S) 
and  (6)  in  (7)  yields  the  following  inequality 
between  these  two  state  variables: 


(l-vi  tt) )  <_  f i  (t)  <_ 


(l-vi(t) I,  Vt  >  0  , 


From  (7)  one  initial  condition  is 


fi(0) 


rikCk 


(8) 


(9) 


It  was  shown  in  [1]  that  step  response  bounds 
can  be  obtained  by  appropriate  manipulations  of 
(4,5)  and  (7-9)  above,  but  the  methodology  is  some¬ 
what  obscure.  We  believe  a  clearer  view  emerges 
from  recasting  the  calculations  into  the  form  of  a 
linear  minimum-  (and  maximum-)  time  optimal  control 
problem  with  state  constraints,  in  wnich  an  input 
u(t)  is  introduced  to  represent  the  unknown  wave¬ 
form  v  (t) : 


Minimize  (or  maximize)  T 


Face  2 

The  zero-state  step  response  of  an  RC  mesh  is 
completely  monotone,  i.e., 

v  (t)  »  3,  j-1 . N.  vt  »  0  .  (4) 

The  proof  is  in  [9J. 

Fact  3 


for  thi  dynamical  system 

(t)  »  -(1-v^t)) 
d 

(1-v^ (t) )  »  u(t)  , 
with  initial  conditions 

fi(0)  -  T0  ,  (1-v^O)  -  1  , 
state  constraints 


(10) 

(11) 


(12) 


For  any  two  nodes  i  and  k  of  an  RC  mesh  and  any 
instant  t  during  the  step  response, 


T.  (l-v.(t))  <  f . (t)  <  T  (l-v.(t)),  Vt  »  0  . 

\  1  -  i  -  P  a  -  (U) 


fii(l-vk't>)  1  r*i,l-Vt,) 


(5) 


input  constraint:  u(t)  <  0,  vt  >  0, 


(14) 


r..  (l-v.(t))  <  r.  „(l-v.  (t) )  . 


(6) 


and  terminal  condition 
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Fig.  3:  Fastest  and  slowest  trajectories  from  the 
initial  state  to  the  target  region,  subject  to  the 
state  constraints  indicated  by  dotted  lines. 

The  optimal  trajectories  can  be  determined  by 
inspection  without  recourse  to  Pontryagin's  maximum 
principle,  since  the  tioie  duration  of  any  path  in 
the  tl-V  -  il  plane  can  be  found  by  rearranging 
and  integrating  (13)  to  yield 
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1-v. 


d  f. 


(16) 
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Fig.  4:  The  bounds  approach  a  well-defined  limit 
for  a  distributed  network  such  as  this  one,  for 


which  TR,  «  1.33  ns.,  Tp,  «  l.S  ns. 
and  T».  •  0.33  ns. 


2.0  ns. 


Thus  tne  fastest  trajectory  from  the  initial  point 
to  the  target  interval  is  the  one  for  which  both  the 
region  of  integration  [f;inal,  finitl  the  inte¬ 
grand  (1-Vj)*^  are  minimized,  and  the  slowest 
trajectory  is  found  similarly.  See  Fig.  3.  The 
minimum  and  maximum  times  depend  on  the  "target" 
voltace  '.v  and  are  denoted  TBln (v,*)  and  Tniax(vi*). 
Tr.e  inverse  functions,  denoted  respectively  ^(tl 
and  are  readily  seen  to  be  the  upper  and  lower 

cour.ds.  respectively,  for  all  feasible  solutions  to 
tne  optimal  control  problem  and  hence  for  the  step 
response  of  the  mesh.  Furthermore,  these  are  the 
test  possible  bounds  we  could  construct  using  the 
information  contained  in  (10) -(15),  since  they  are 
attained  by  feasible  trajectories.  The  algebraic 
form  of  v,(t)  and  9j(t)  obtained  in  this  way  can  be 
easily  read  off  from  Fig.  3  and  agrees  with  the 
results  in  [1]:  tne  exact  expressions  are  omitted 
for  the  sake  of  crevity.  They  approach  a  well- 
iefined  limit  in  the  case  of  a  distributed  network, 
e.g.,  the  simple  example  in  Fig.  4,  for  which  they 
are  plotted  in  Fig.  5. 

1IC  0  lKf. 


Fig.  S:  Step  response  bounds  for  the  network  in 
Fig.  4,  with  output  taken  at  noda  i. 

The  reader  can  chack  that  the  "Elmore  time 
constant"  T0,  ia  the  first  moatent  of  the  impulse 
response  and 1 therefore  a  reasonable  estimate  of  the 
delay.  The  seep  response  estimate  v<  est*e>  « 

1  -  exp  (-t/T0. ) ,  discussed  in  [8,10]!  corresponds 
to  a  straight  line  trajectory  from  the  initial 
condition  to  tha  origin  in  Fig.  3  and  is  therefore 
a  feasible  (but  not  optimal)  solution  to  the  optimal 
control  problem,  i.e.,  v^tt)  ivi>ett(t)  ^3,(t), 
vt  >  0,  for  every  mesh.  It  is  readily  seen  that 
!»i.i  T°i  ^  T»  always  and  that  the  estimate  end 
bounds  represent  an  effort  to  approximate  the 
dynamics  of  a  higher  order  network  by  one  with  a 
single  time  constant  TQ.:  they  era  exact  only  in  that 
case.  Whenever  (Tp  -  T*i*  “  TDi,  the  wedge-theped 
region  in  Fig.  3  is  quite  narrowband  the  bounds  will 
be  quite  tiqht.  Chapter  3  of  [8]  gives  examples  of 
networks  for  which  the  bounds  are  good  and  others 
where  they  are  poor. 


IV. 


Method  "k"  for  Bounds  Improvement:  Limits  on 
Che  Maximum  Slew  Rate  of  Node  Voltages 


The  optimal  trajectories  shown  in  Fig.  3 
include  horizontal  segments  along  which  v^  changes 
while  f^  remains  constant.  Since  •  -d-v,)  <  0 
these  segments  correspond  to  instantaneous  jumps 


the  bounds  by  adding  constraints  eliminating  such 
crsjsctories.  The  simplest  form  for  such  a 
constraint  is  a  "minimum  slope  bound"  in  the 
(l-v,)-^  plane  of  the  form 

dTI^T  i  Ti  »  0  •  tl7> 

This  rules  out  both  trajectories  in  Fig.  3  as 
feasible  solutions.  The  new  optimal  trajectories 
are  as  shown  in  Fig.  6,  and  the  corresponding 
algebraic  fom  for  v^it)  and  0^(t)  is  given  in  (11). 

The  inequality  (17)  corresponds  to  a  "slew- 
rate  bound,"  i.e.,  a  bound  on  the  derivative,  for 
v. ,  since 


TI^T-  -u-vt)/(i-v.)  -  (i-v/;, 


dd-v,) 
d  f. 


‘i 

(18) 


Fig.  6:  The  aloc«  constraint  (17)  alters  the  optimal 
trajectories  in  Fig.  3  as  shown. 

For  any  mesh  we  know  that  2  rii^i>  *ihce 

l-vi  «  j  r^.C.v.  2  r  ^C.v  ,  frora  (2)  and  (4). 
j«l  13  3  3  ~  1  1 

Using  t1  ■  *iiCi  can  significantly  tighten  the 
bounds  whenever  the  mesh  contains  only  a  small  num- 
ber  of  lumped  capacitors,  as  is  commonly  the  case 
m  reasonably  accurate  circuit  models  for  distributed 
interconnect  [12].  Sut  as  progressively  more  R's 
and  C‘s  are  used  to  model  a  given  section  of  inter¬ 
connect,  C^  -  0  and  (17)  becomes  useless  with 
Ti  *  rii~i‘ 

Fortunately,  values  of  ti  greater  than  riACx 
can  be  found  for  many  RC  trees.  Space  constraints 
limit  us  to  mentioning  only  one  of  the  results  in 
this  direction  obtained  in  [13].  Consider  an  RC 
line  with  the  nodes  numbered  in  increasing  order  as 
one  moves  away  from  the  source.  It  was  first  noted 
in  [8]  that  for  such  a  network 

v-.  (t)  v  <t) 

HTTo  -I^TTo  •  i' ve  i  0  '  (19> 

a  rigorous  proof  was  given  in  [14],  and  the  result 
extended  in  [13 1  to  include  all  nodes  of  an  RC  tree 
up  to  the  first  branch  point.  Using  (19)  and  (5) 
m  (2) ,  one  can  show  that  if  i  is  any  node  of  an 
RC  line,  or  any  node  of  an  RC  tree  between  the 
source  and  the  first  branch  point,  then 


Tx  -  3Ixri,Vrii  '  \  ■  <JC 

The  dark  curve  in  Fig.  5  shows  the  original  bounds 
in  f 1 J  for  the  network  in  Fig.  4,  along  with  the 
improvement  one  obtains  from  using  the  slew-rate 
bound  (23) . 

v.  Method  "B"  for  Bounds  Impovement:  Spatial 
Convexity  of  Node  Voltages 

At  any  instant  during  the  step  response  of  an 
RC  line  or  tree,  the  node  voltages  are  a  convex 
function  of  distance  from  the  source.  This  is  a 
consequence  of  the  monotone  charging  of  capacitors, 
indicated  in  (2).  For  the  network  in  Fig.  4,  a 
characteristic  voltage  profile  is  plotted  in  Fig. 

7,  wnere  the  ’'distance*  from  a  point  x  to  the 
source  is  represented  by  the  resistance  r„  .  The 


Fig .  7:  Typical  convex  spatial  voltage  profile  for 
the  network  in  Ficj.  4,  along  with  the  bounds  (5) 
and  (6). 

inequalities  (S)  and  (6)  bound  the  node  voltages 
elsewhere  in  the  network  in  terms  of  the  node 
voltage  Vj^  of  interest  and  are  also  plotted  in 
Fig.  7  for  this  RC  line.  They  are  quite  different 
in  character:  (5)  gives  a  spatially  convex  profile 
in  this  case  but  (6)  does  not.  Considerable 
improvement  over  (6)  is  possible  since  a  convex 
curve  is  bounded  below  by  any  tangent  line,  i.e., 

f  !rkk  1 

l-vk  <  (1-v.)  laX  -  1  (21) 

L  i  “  JJ 

for  some  X  c[0,l).  Substituting  (21)  into  the  right 
hand  side  of  (7)  and  taxing  the  maximum  over  X  yields 

fi  I  <1"V  ““  {Td  '  l  IT  (1-V  )  , 

i  k  ii 


thus  reducing  the  effective  value  of  Tp  (from  2.00 
ns.  to  1.83  ns.  for  the  network  in  Fig.  4)  and 
further  improving  the  voltage  bounds  as  shown  in 
Fig.  5.  Current  research  includes  extending  this 
technique  to  trees. 

VI .  concluding  Remarks 

The  research  in  this  paper  was  stimulated  by 
recent  work  that  appeared  in  [1,8] .  The  new 
developments  reported  here  are  1)  two  lemmas 
[7,9)  that  provide  a  rigorous  basis  for  extending 
the  theory  from  RC  trees  to  RC  meshes,  2)  the 
optimal  control  formulation  of  the  problem  [13], 

3)  an  extension  and  rigorous  proof  [13]  of  a  bound 
on  node  voltage  slew  rates,  4)  a  systematic  method 
[11]  for  finding  tignter  step  response  bounds  using 
slew  rate  limits,  and  5)  method  "B"  for  bounds 
improvement. 
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This  paper  describes  a  simple  model  for  the  cost  of  Implementing  a  circuit 
in  a  multilayer  integrated  circuit  (MLC)  technology  relative  to  the  cost 
required  for  a  conventional  single-plane  version.  The  model  indicates  that  MLC 
technologies  can  be  used  to  cost-effectively  implement  circuits  that  have 
significantly  more  transistors  than  can  be  obtained  with  a  single-plane 
technology  only  if  the  availability  of  the  third  dimension  for  device  placement 
and  interconnect  results  in  a  significant  reduction  in  the  total  silicon  area 
used.  A  potential  application  for  MLC  technology  is  improving  the  speed  of 
circuits  of  moderate  size  by  using  the  third  dimension  to  reduce  the  length  and 
associated  resistance  and  parasitic  capacitance  of  interconnect  lines.  Examples 
of  these  types  of  applications  are  discussed,  as  is  the  applicability  of  the 
model  to  other  three-dimensional  integrated  circuit  technologies  currently  under 
Investigation. 
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Abstract:  In  this  paper,  we  prove  that  maximum  planar  H -matching  (the  problem  of  determining 
the  maximum  number  of  node-disjoint  copies  of  the  fixed  graph  H  contained  in  a  variable  planar 
graph  G)  is  NP-complete  for  any  connected  planar  graph  H  with  three  or  more  nodes.  We  also 
show  that  perfect  planar  H -matching  is  NP-complete  for  any  connected  outerplanar  graph  H 
with  three  or  more  nodes,  and  is,  somewhat  surprisingly,  solvable  in  linear  time  for  triangulated 
H  with  four  or  more  nodes.  The  results  generalize  and  unify  several  special-case  results  proved 
in  the  literature.  The  techniques  can  also  be  applied  to  solve  a  variety  of  problems,  including  the 
optimal  tile  salvage  problem  from  wafer-scale  integration.  Although  we  prove  that  the  optimal  tile 
salvage  problem  and  others  like  it  are  NP-complete,  we  also  describe  provably  good  approximation 
algorithms  that  are  suitable  for  practical  applications. 


Key  Words:  Approximation  Algorithm,  Covering,  Matching,  NP-Completc,  Optimal  Tile  Sal¬ 
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